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Abstract 

In this paper we consider solving noisy under-determined systems of linear equations with sparse so- 
lutions. A noiseless equivalent attracted enormous attention in recent years, above all, due to work of 
[12, 13,25] where it was shown in a statistical and large dimensional context that a sparse unknown vector 
(of sparsity proportional to the length of the vector) can be recovered from an under-determined system 
via a simple polynomial £\ -optimization algorithm. [13] further established that even when the equations 
are noisy, one can, through an SOCP noisy equivalent of £\, obtain an approximate solution that is (in an 
£2 -norm sense) no further than a constant times the noise from the sparse unknown vector. In our recent 
works [62, 63], we created a powerful mechanism that helped us characterize exactly the performance of 
£\ optimization in the noiseless case (as shown in [61] and as it must be if the axioms of mathematics are 
well set, the results of [62, 63] are in an absolute agreement with the corresponding exact ones from [25]). 
In this paper we design a mechanism, as powerful as those from [62,63], that can handle the analysis of a 
LASSO type of algorithm (and many others) that can be (or typically are) used for "solving" noisy under- 
determined systems. Using the mechanism we then, in a statistical context, compute the exact worst-case 
£2 norm distance between the unknown sparse vector and the approximate one obtained through such a 
LASSO. The obtained results match the corresponding exact ones obtained in [6, 26]. Moreover, as a by- 
product of our analysis framework we recognize existence of an SOCP type of algorithm that achieves the 
same performance. 

Index Terms: Noisy linear systems of equations; LASSO; SOCP; i\ -optimization; compressed sensing 



1 Introduction 

In recent years the problem of finding sparse solutions of under-determined systems of linear equations 
attracted enormous attention. Applications seem vast and as if they are growing almost on a daily basis (see, 
e.g. [4, 10, 14, 22, 30, 45, 49, 53, 55-57, 69, 71] and references therein). Given a substantial interest in the 
problem (and especially that it is coming from a variety of different fields), one may assume that designing 
efficient algorithms that would solve it could be of far-reaching importance. To that end, we believe that a 
precise mathematical understanding of the phenomena that make certain algorithms work well would help 
solidify belief in their success in current and future applications. Moreover, it is possible that down the road 
it can also help expand further the range of their applications. 

Moving long the same lines, we in this paper focus on studying mathematical properties of under- 
determined systems of linear equations and certain algorithms used to solve them. We start the story by 
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introducing an idealized version of the problem that we plan to study. In its simplest form it amounts to 
finding a /c-sparse x such that 

Ax = y (1) 



where A is an m x n (m < n) matrix and y is an m x 1 vector (see Figure 1 ; here and in the rest of the 
paper, under /c-sparse vector we assume a vector that has at most k nonzero components). Of course, the 
assumption will be that such an x exists (clearly, the case of real interest is k < m). To make writing in the 
rest of the paper easier, we will assume the so-called linear regime, i.e. we will assume that k = f3n and that 
the number of equations is m = an where a and (3 are constants independent of n (more on the non-linear 
regime, i.e. on the regime when m is larger than linearly proportional to k can be found in e.g. [21,34,35]). 
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Figure 1 : Model of a linear system; vector x is /c-sparse 

If one has the freedom to design matrix A then the results from [2, 46, 52] demonstrated that the tech- 
niques from coding theory (based on coding/decoding of Reed-Solomon codes) can be employed to deter- 
mine any /c-sparse x in (1) for any < a < 1 and any f3 < ^ in polynomial time. It is relatively easy to 
show that under the unique recoverability assumption f3 can not be greater than ^. Therefore, as long as one 
is concerned with the unique recovery of /c-sparse x in (1) in polynomial time the results from [2,46,52] 
are optimal. The complexity of algorithms from [2,46,52] is roughly 0(n 3 ). In a similar fashion one can, 
instead of using coding/decoding techniques associated with Reed/Solomon codes, design the matrix and 
the corresponding recovery algorithm based on the techniques related to coding/decoding of Expander codes 
(see e.g. [41,42,72] and references therein). In that case recovering x in (1) is significantly faster for large 
dimensions n. Namely, the complexity of the techniques from e.g. [41,42,72] (or their slight modifications) 
is usually 0(n) which is clearly for large n significantly smaller than 0(n 3 ). However, the techniques based 
on coding/decoding of Expander codes usually do not allow for (3 to be as large as ^. 

On the other hand, if one has no freedom in choice of A designing the algorithms to find /c-sparse x in 
(1) is substantially harder. In fact, when there is no choice in A the recovery problem (1) becomes NP-hard. 
Two algorithms 1) Orthogonal matching pursuit - OMP and 2) Basis pursuit - l\- optimization (and their 
different variations) have been often viewed historically as solid heuristics for solving (1) (in recent years 
belief propagation type of algorithms are emerging as strong alternatives as well). Roughly speaking, OMP 
algorithms are faster but can recover smaller sparsity whereas the BP ones are slower but recover higher 
sparsity. In a more precise way, under certain probabilistic assumptions on the elements of A it can be 
shown (see e.g. [51, 66, 67]) that if m = 0(k\og{n)) OMP (or slightly modified OMP) can recover x in 
(1) with complexity of recovery 0(n 2 ). On the other hand a stage- wise OMP from [29] recovers x in (1) 
with complexity of recovery 0(n log n). Somewhere in between OMP and BP are recent improvements 
CoSAMP (see e.g. [50]) and Subspace pursuit (see e.g. [23]), which guarantee (assuming the linear regime) 
that the /c-sparse x in (1) can be recovered in polynomial time with m = O(k) equations which is the same 
performance guarantee established in [13,25] for the BP. 
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We now introduce the BP concept (or, as we will refer to it, the t\ -optimization concept; a slight 
modification/adaptation of it will actually be the main topic of this paper). Variations of the standard 
£\ -optimization from e.g. [15, 19,60] as well as those from [24,32,37-39,59] related to l„ -optimization, 
< q < 1 are possible as well; moreover they can all be incorporated in what we will present below. The 
l\ -optimization concept suggests that one can maybe find the A;-sparse x in (1) by solving the following 
£i-norm minimization problem 

min ||x||i 

subject to Ax = y. (2) 

As is then shown in [13] if a and n are given, A is given and satisfies the restricted isometry property (RIP) 
(more on this property the interested reader can find in e.g. [1,5, 11-13,58]), then any unknown vector x 
with no more than k = fin (where /3 is a constant dependent on a and explicitly calculated in [13]) non-zero 
elements can indeed be recovered by solving (2). In a statistical and large dimensional context in [25] and 
later in [63] for any given value of (3 the exact value of the maximum possible a was determined. 

As we mentioned earlier the above scenario is in a sense idealistic. Namely, it assumes that y in (2) was 
obtained through (1). On the other hand in many applications only a noisy version of Ax. may be available 
for y (this is especially so in measuring type of applications) see, e.g. [12, 13,40,70]. When that happens 
one has the following equivalent to (1) (see, Figure 2) 

y = Ax + v, (3) 

where v is an m x 1 vector (often dubbed as the noise vector; the so-called ideal case presented above is of 
course a special case of the noisy case). Finding the fc-sparse x in (3) is now incredibly hard. Basically, one 
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Figure 2: Model of a linear system; vector x is /c-sparse 

is looking for a A>sparse x such that (3) holds and on top of that v is unknown. Although the problem is 
hard there are various heuristics throughout the literature that one can use to solve it approximately. Below 
we restrict our attention to two groups of algorithms that we believe are the most relevant to the results that 
we will present. 

To introduce a bit or tractability in finding the A;-sparse x in (3) one usually assumes certain amount of 
knowledge about either x or v. As far as tractability assumptions on v are concerned one typically (and 
possibly fairly reasonably in applications of interest) assumes that 1 1 v 1 1 2 is bounded (or highly likely to be 
bounded) from above by a certain known quantity. The following second-order cone programming (SOCP) 
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analogue to (2) is one of the approaches that utilizes such an assumption (see, e.g. [13]) 

min || x ||i 

X 

subject to ||y — -Ax||2 < r (4) 

where, r is a quantity such that 1 1 v 1 1 2 < r (or r is a quantity such that 1 1 v 1 1 2 < r is say highly likely). 
For example, in [13] a statistical context is assumed and based on the statistics of v, r was chosen such 
that 1 1 v 1 1 2 < r happens with overwhelming probability (as usual, under overwhelming probability we in 
this paper assume a probability that is no more than a number exponentially decaying in n away from 1). 
Given that (4) is now among few almost standard choices when it comes to finding the x-sparse in (3), 
the literature on its properties is vast (see, e.g. [13, 28, 65] and references therein). Also, given that this 
SOCP will not be the main topic of this paper we below briefly mention only what we consider to be the 
most influential work on this topic in recent years. Namely, in [13] the authors analyzed performance of (4) 
and showed a result similar in flavor to the one that holds in the ideal - noiseless - case. In a nutshell the 
following was shown in [13]: let x be a /3n-sparse vector such that (3) holds and let x socp be the solution 
of (4). Then ||x socp — x||2 < Cr where (3 is a constant independent of n and C is a constant independent 
of n and of course dependent on a and f3. This result in a sense establishes a noisy equivalent to the fact 
that a linear sparsity can be recovered from an under-determined system of linear equations. In an informal 
language, it states that a linear sparsity can be approximately recovered in polynomial time from a noisy 
under-determined system with the norm of the recovery error guaranteed to be within a constant multiple 
of the noise norm. Establishing such a result is, of course, a feat in its own class, not only because of its 
technical contribution but even more so because of the amount of interest that it generated in the field. 

In this paper we will also consider an approximate recovery of the fc-sparse x in (3). However, instead 
of the above mentioned SOCP we will focus on a group of highly successful algorithms called LASSO (the 
LASSO algorithms, as well as the SOCP ones, are of course well known in the statistics community and 
there is again a vast literature that covers their performance (see, e.g. [6,9, 17, 18,26,48,64,68] and references 
therein). There are many variants of LASSO but the following one is probably the most well known 

min||y- Ax||| + A iasso ||x||i. (5) 

X 

Xiasso in (5) is a parameter to be chosen based on the amount of pre-knowledge one may have about A, v, 
and/or x. The results that relate to the characterization of the approximation error of (5) that are similar to the 
SOCP ones mentioned above can be established (see, e.g. [7]). Of course, characterizing the performance 
of the recovery algorithm through the norm-2 of the error vector is only one possible way among many 
(more on other measures of performance can be found in e.g. [9, 70]). In this paper we will develop a 
novel framework for performance characterization of the LASSO algorithms. Among other things, in a 
statistical context, the framework will enable us to provide a precise characterization of the norm-2 of the 
approximation error of the LASSO algorithms. 

While our main focus in this paper are algorithms from the LASSO group we mention that besides 
the SOCP and LASSO algorithms there are of course various other algorithms/heuristics that have been 
suggested as possible alternatives throughout the literature in recent years. Such an alternative that gained 
certain amount of popularity is for example the so-called Dantzig selector introduced in [16]. The Dantzig 
selector amounts to solving the following optimization problem 

min ||x||i 
subject to P T (-4x - y)||oo < C Dan , 

where Coan is a carefully chosen parameter that of course should depend on A,v, and/or x. As a lin- 
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ear program the Danzig selector promises to be faster than SOCP or LASSO which are both quadratic 
programs. On the other hand recent improvements in numerical implementations of LASSO's and their 
solid approximate recovery abilities make them quite competitive as well (more on a thorough discus- 
sion/comparison, advantages/disdvatnages of the Dantzig selector and the LASSO algorithms can be found 
in e.g. [3,8,31,33,43,44,47]). 

To facilitate the exposition and the easiness of following we will present our framework on a version of 
the LASSO from (5). Namely, we will consider, 

min ||y — -Ax||2 

X 

subject to ||x||i < ||x||i (6) 

where x is the original fc-sparse x that satisfies (3) (we just briefly mention that in a context that will be 
considered in this paper it is not that difficult to transform the LASSO from (6) to one that is structurally 
equivalent to (5); however, we stop short of exploring this connection further before presenting our main 
results and only mention that a section towards the end of the paper will explore it in more detail.). We do 
however mention right here that in order to run (6) one does require the knowledge of ||x||i. In a sense this 
requirement is an equivalent to setting r and A/ asso in (4) and (5), respectively. In order to be maximally 
effective both r and Xi asso do require some amount of pre-knowledge about A, v, and/or x. 

Before we proceed further we briefly summarize the organization of the rest of the paper. In Section 2, 
we present a statistical framework for the performance analysis of the LASSO algorithms. To demonstrate 
its power we towards the end of Section 2, for any given a and (3, compute the worst case norm-2 of the 
error that (6) makes when used for approximate recovery of general sparse signals x from (3). In Section 
3 we then specialize results from Section 2 to the so-called signed vectors x. In Section 4 we discuss how 
the LASSO from (6) can be connected to the LASSO from (5). In Section 5 we demonstrate that there is 
an SOCP algorithm (similar to the one given in (4)) that achieves the same performance as do (6) and a 
corresponding (5). In Section 6 we present results that we obtained through numerical experiments. Finally, 
in Section 7 we discuss obtained results. 

2 LASSO's performance analysis framework - general x 

In this section we create a statistical LASSO's performance analysis framework. Before proceeding further 
we will now explicitly state the major assumptions that we will make (the remaining ones, will be made 
appropriately throughout the analysis). Namely, in the rest of the paper we will assume that the elements 
of A are i.i.d. standard normal random variables. We will also assume that the elements of v are i.i.d. 
Gaussian random variables with zero mean and variance a. As stated earlier, we will assume that x is 
the original x in (3) that we are trying to recover and that it is any A;-sparse vector with a given fixed 
location of its nonzero elements and a given fixed combination of their signs. Since the analysis (and 
the performance of (6)) will clearly be irrelevant with respect to what particular location and what particular 
combination of signs of nonzero elements are chosen, we can for the simplicity of the exposition and without 
loss of generality assume that the components xi, X2, . . . , x ra _j. of x are equal to zero and the components 
x n _fc + i, x n _fc + 2, . . . , x ra of x are greater than or equal to zero. Moreover, throughout the paper we will call 
such an x /c-sparse and positive. In a more formal way we will set 

Xi = X 2 = • • • = ± n - k = 

x n _ fe+ i > 0, x n _ fe+ i > 0, . . . , x n > 0. (7) 
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We also now take the opportunity to point out a rather obvious detail. Namely, the fact that x is positive is 
assumed for the purpose of the analysis. However, this fact is not known a priori and is not available to the 
solving algorithm (this will of course change in Section 3). 

Once we establish the framework it will be clear that it can be used to characterize many of the LASSO 
features. We will defer these details to a collection of forthcoming papers. In this paper we will present only 
a small application that relates to a classical question of quantifying the approximation error that (6) makes 
when used to recover any fc-sparse x that satisfies (3) and is from a set of x's with a given fixed location of 
nonzero elements and a given fixed combination of their signs. 

Before proceeding further we will introduce a few definitions that will be useful in formalizing this 
application as well as in conducting the entire analysis. As it is natural we start with the solution of (6). Let 
x be the solution of (6) and let wi asso E R n be such that 

x = x + wi asso . (8) 

As an application of our framework we will compute the largest possible value of ||x — x||2 = ||w; asso ||2 
for any combination (a, (3). Or more rigorously, for any combination (a, (3), we will find a di asso such that 

lim P (di asso ~ e < max || w Zasso || 2 < di asso + e) = 1 (9) 

n^oo x 

for an arbitrarily small constant e. However, before doing so we will first present the general framework. 
The framework that we will present will center around finding the optimal value of the objective function in 
(6) (of course in a probabilistic context). In the first of the following two subsections we will create a lower 
bound on this optimal value. We will then afterwards in the second of the subsections create an upper bound 
on this optimal value. Naturally in the third subsection we will show that the two bounds actually match. To 
make further writing easier and clearer we set already here 

Cobj = min ||y-^x|| 2 

X 

subject to ||x||i < ||x||i. (10) 

2.1 Lower-bounding ( obj 

In this section we present the part of the framework that relates to finding a "high-probability" lower bound 
on Cobj- To make arguments that will follow less tedious we will make an assumption that is significantly 
weaker than what we will eventually prove. Namely, we will assume that there is a (if necessary, arbitrarily 
large) constant C w such that 

^(||w iasso || 2 < C w ) > 1 - e"^. (11) 

To make our arguments flow more naturally, one should probably provide a direct proof of this statement 
right here. However, given the difficulty of the task ahead we refrain from that and assume that the statement 
is correct. Roughly speaking, what we assume is that ||w2 asso ||2 is bounded by an arbitrarily large constant 
(of course we hope to create a machinery that can prove much more than (1 1)). 

We start by noting that if one knows that y = A5t + v holds then (10) can be rewritten as 

min || v + Ax — AxlU 

X 

subject to ||x||i < ||x||i. (12) 



6 



After a small change of variables, x = x + w, (12) becomes 

|| v — Aw||2 



mm 

w 



subject to ||x + w||i < ||x||i, 



(13) 



or in a more compact form 



min 1 1 A, 



subjectto ||x + w||i < ||x||i, (14) 
where A v = [—A v] is now anmx(n + l) random matrix with i.i.d. standard normal components. Let 



S w (<t,x,C w ) = { 



e R n+1 \ ||w|| 2 <C w and ||x + w||i < ||x||i}. 



Further, let 



and set, 



f obj (a,w) = || A 



Jhelp) 
^obj 



min /obj(o-,w 



mm 



I A 



[w T <T] T e5 w ((T,X,Cw) 



(15) 



(16) 



1 2 = min max & T A V 

[w^]T e S w ( ff ,x )Cw ) ||a|| 2 =l 

(17) 



w 

a 



We now state a lemma from [36] that will be of use in what follows. 

Lemma 1. ( [36]) Let Abe anmxn matrix with i.i.d. standard normal components. Let g and h be m x 1 
and n x 1 vectors, respectively, with i.i.d. standard normal components. Also, let g be a standard normal 
random variable and let C R n be an arbitrary subset. Then for all choices of real ip<f, 

m n 

P(min max (a T ^0+ 1101125-^) > 0) > P(min max (\\</>\\ 2 V + V - ^) > 0). (18) 
(*e$ ||a|| 2 =i <£e* Hall o=i f—f *r-i 



i=i 



i=i 



Now, after applying Lemma 1 one has 



PL rp . mm (/ o6j -(c7,w) + ^||w||l + ( 7^)>cS) 



P ( min max f a T A^ 

Jw T <r] T eS„(ff,x,C„) l|a|| 2 =l 



> P I min rriax^ I ^ ||w||| + cr 2 ^ g^ + ^ h;W; + h n+1 a J > C^- 



\ [w T ff] T 6S„((r,x,C„) ||a|| 2 = 

In what follows we will analyze the following probability 

Vl = P 



i=l 



i=l 



mm max 

KaFeS„(«T,x,C„) ||aj| 2 =l 



w 



(0 

obj 



(19) 



(20) 



i=i i=i / / 

which is of course nothing but the probability on the left-hand side of the inequality in (19). We will 
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essentially show that for certain C^- this probability is close to 1. That will rather obviously imply that we 
have a "high probability" lower bound on C, bj- To that end, we first note that the maximization over a is 
trivial and one obtains 



Pl = p [< t 1T T! -n , Vll w ll2 + (j2 llgll 2 + E h ^ +hn+lCT - c S • (21) 

y[w J a] 1 es„(ff,x,c„) y v ^ J J 

To facilitate the exposition that will follow let 



£(<7, g, h, x) = rnin ^H^ + ^llglb + £ h * w * ■ < 22) 

One should note here that, although present in the definition of S w , a clearly does not have an impact on 
the result of the above optimization. Now we split the analysis into two parts. The first one will be the 
deterministic analysis of £(<r, g, h, x) and will be presented in Subsection 2.1.1. In the second part (that will 
be presented in Subsection 2.1.2) we will use the results of such a deterministic analysis and continue the 
above probabilistic analysis applying various concentration results. 

2.1.1 Optimizing £(<j, g, h, x) 

In this section we compute £(cr, g, h). We first rewrite the optimization problem from (22) in the following 
possibly clearer form 

n 

Vi r 



£(<7,g,h,x) = min yj || w||| + o- 2 ||g|| 2 + ^ hj\v, 

i=i 

subject to ||x + w||i < ||x||i 



^||w||2 + fT 2 < VCl + v 2 . (23) 



To remove the absolute values we introduce auxiliary variables tj, 1 < i < n and transform the above 
problem to 



£(o-,g,h,x) = min J||w|| 2 + cr 2 ||g|| 2 + V h ; 

w,t " — • 



.Wj 

i=l 



subject to y~]tj < ||x||i 
i=i 

Xj + Wj — tj<0,n — k + 1 < i < n 
— Xj — Wj — tj < 0, n — k + 1 < i < n 
Wj — tj<0, 1 < i < n — k 
— Wj — tj < 0, 1 < i < n — k 

^||w|| 2 + a 2 < ^Cl + a 2 . (24) 
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The Lagrange dual of the above problem then becomes 

ran re 

£(i/,A«,A< 2 >,w,t,7) = ^/||w||2 + a2|| g || 2 + ^ h . w . + I/ ^t i - l /||x||i+ E A^iq+w.-t,) 

i=l i=l i=n— fe+1 

ra n—k n—k 

+ E Af ) (-x l -w l -tO + ^A? ) (w J -t J ) + ^AfVw J -t l )+ 7 (^||w||| + a2- v / C2^). 

i=n— fc+1 i=l i=l 

(25) 

After rearranging the terms we further have 

ran ra 

£(i/,A« A( 2 ),w,t, 7 ) = ^||w||| + C T2||g|| 2 +^h l w,- I ,||x|| 1 +^t,^-Aj 1) -Af ) )+ E A 4 (1) (x,+w,) 

i=\ i=l i=ra— fc+1 

n ra— fe ra— A; 

+ Af } (-x, - w 8 ) + £ \« Wi - E Af ) w l + 7 (yH + <r> - VC^T^). (26) 

i=ra— fe+1 i=l i=l 

After a few further arrangements we finally have 

n n 

C(u, A« , A( 2 ) , w, t, 7 ) = ^||w||| + <72(||g|| 2 + 7) + E h * w * - #|i + E ^ - A ? } - A ? } ) 

1=1 1=1 

+ E (AS 1 )-Af) ) x i + E(AS 1) -Af) ) w,-7v / ^T^. (27) 

i=n— fe+1 i=l 

Setting (i/ - Af ) - Af } ) = 0, 1 < i < n, (to insure that the dual is bounded) and combining (24) and (27) 
is enough to obtain 

£(cr,g, h,x)= max min ,w,t 

u,\W ,\( 2 ) ,j w,t 

subject to A^ > 0, 1 < j < n, 1 < i < 2 
v > 

i/ - Af } - Af } = 0, 1 < i < n 

7 > 0, (28) 



where we of course use the fact that the strict duality obviously holds. After removing the minimization 
over t we have 

£(<j, g, h, x) = max min £(z7 ,W,7 

subject to A w > 0, 1 < j < n, 1 < i < 2 
v > 

17 - - A f ) = 0, 1 < i < n 

7 > 0. (29) 
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where 



n 

/>,A«A( 2 \w, 7 ) = ^||w||| + a2(||g|| 2+7 )+^h i w i -^||x|| 1 + Y, (AS 1) -Af ) )x,+^(A l (1) -Af ) )w J - 7 7^T^ 

i=l i=n— k+1 i=l 

(30) 

The inner minimization over w is now doable. Setting the derivatives with respect to Wj to zero one obtains 



^±4 + (h + A«-A^)) = 0, 



(31) 



where A« = [A^, A^ 1} , . . . , A^] T , A^ 2 ) = [xf\ xf , . . . , X^f- From (31) one then has 

w(||g|| 2 + 7) = + a 2 (h + A« - A< 2 >) (32) 



or in a norm form 
From (33) we then find 

and from (32) 



|w||l(||g|| 2 + 7) 2 = (l|w|| 2 + a 2 )||h + A« - A( 2 ) || 2 . (33) 

||w so ;|| 2 = , (34) 
y(||g|| 2+7 )2_|| h + AW-A(2)||l 

^(h + AW-A( 2 )) 

w so / = (35) 
V(Hgll2+7) 2 -||h + A(i)-A(2)||2 

where w so ; is of course the solution of the inner minimization over w. Now, one should note that (34) and 
(35) are of course possible only if ||g|| 2 + 7 — ||h + A^ — A^ 2 ^ || 2 > 0. Later in the paper we will recognize, 
that for AW and A^ 2 ) that are optimal in (29), validity of this condition essentially implies the regime (in 
(a, (5) plane) where the worst-case ||w|| 2 is finite with overwhelming probability (or equivalently, if for such 
A^ and A^ 2 ) the condition is not valid then for the corresponding (a, (3) the worst-case ||w|| 2 is infinite with 
overwhelming probability). Plugging the value of w so i from (35) back in (29) gives 

, n 

£(<7,g,h,x)= max a J(\\g\\ 2 + - ||h + A« - A( 2 ) ||| - + £ (A 4 {1) - xf } )x, - 7V / C 2 ^ 

t=n— k+1 

subject to A w > 0, 1 < j < n, 1 < i < 2 
v > 

v ~ - A f ) = 0, 1 < i < n 
||g|| 2 + 7 - ||h + A (1) -A (2) || 2 > 

7 > 0. (36) 

Let z^ 1 ) = [1, 1, ... , l] T . By plugging the constraint A^ = uz^ — X^ back into the objective function 
and making sure that v — A > 0, 1 > i > n, one can remove A* 1 ) from the above optimization and get the 
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following 

n 

£(<7,g,h,x) = max (J J(|| g || 2+7 )2_|| h + z/z (i)_2A(2)||2-^||x|| 1 + V (i/ - 2Af } )5q - 7^ + ^ 2 
"' A(2) ^ i=n-fc+l 

subject to z/ > 

< Xf } < 1/, 1 < i < n 

||g|| 2 + 7 - ||h + z/z (1) - 2A (2) || 2 > 

7 > 0. (37) 
Since we assumed that Xj > 0, n — k + 1 < i < n, and 5q = 0, 1 < i < n — k one then from (37) has 



£(<7,g,h,x)= max a J(||g|| 2 + 7 )2 - ||h + uzW - 2A( 2 ) ||| - 2 V Af } x, - ly Jd + a 2 

"• A() >T i=n-fc+l 

subject to v > 



< Af ) < 1/, 1 < i < n 

||g|| 2 + 7 - ||h + z/z (1) - 2A (2) || 2 > 

7 > 0. (38) 

After a simple scaling of A^ 2 ^ one finds that the following is an equivalent to (38) 

, n 

£(<r,g,h,x) = max <Ty/(\\ S \\ 2 + 7 ) 2 - ||h + - A® ||| - £ Af ) x l - 7 V / ^+^ 



subject to z/ > 



=n-fc+l 



< Af } < 2v, 1 < i < n 

||g|| 2 + 7 - ||h + z/z (1) - A (2) || 2 > 

7 > 0. (39) 



Now, the maximization over 7 can be done. After setting the derivative to zero one finds 



, HgH2+7 = _ v / C 2 f+a 2 = Q 

V(llgll2+7) 2 -||h + ^a)-A( 2 )|| 2 



(40) 



and after some algebra 



lo P t = Jl + ^\\h + vtV - A( 2 )|| 2 - ||g|| 2 , (41) 



where of course 7opt would be the solution of (39) only if larger than or equal to zero. Alternatively of 
course 7 opt = 0. Now, based on these two scenarios we distinguish two different optimization problems: 
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1. The " overwhelming" optimization 



£m,(o",g,h,x) = max 



^||g|||-||h + Z ,z(l)-A(2)||2- £ \- 



(2) 



X,' 



i=n— 



subject to v > 



< A„ {2) < 2i/, 1 < i < n. 



(42) 



2. The "non-overwhelming" optimization 



£ri<w(cr,g,h,x) = max ^C 2 , + <r 2 ||g|| 2 - C w ||h + i/z (1) - A (2) || 2 - ^ Aj 2) Xj 

i=n— fc+l 



subject to v > 



< Aj 2) < 2i/, 1 < i < n. 



The "overwhelming" optimization is the equivalent to (39) if for its optimal values v and A( 2 ) holds 



(43) 



^l + ^-llh + ^zW-A^lla^ ||g|| 2 , 
We now summarize in the following lemma the results of this subsection. 



(44) 



Lemma 2. Let v and A( 2 ) he the solutions of (42) and analogously let v and A( 2 ) be the solutions of (43). 
Let £(<t, g, h, x) he, as defined in (22), the optimal value of the objective function in (22). Then 



£(cr,g,h,x 



Vllslli - H h + - z(1) - A(2) Hi - \ (2) ^, _ 

. ^Cl + a 2 ||g|| 2 - C w ||h + z>z« - \& || 2 - Er=„-fc+i Af ) x i , otherwise 
Moreover, let w be the solution of (22). Then 



(45) 



w(a,g,h,x) = < 



and 



|w(cr,g,h,x)|| 2 = < 



^/||g||2_|| h+Pz (l)_ A (2)||2' 



a\\h+uzW-X( 2 )) 



2 -"h+i>z(i)-A(2)||2' 



if Jl + £\\h + i>zW-\W\\2< 



g 2 



(46) 



otherwise 



V IISII2 

C w , 



1/ y/lT^rllh + z/zt 1 ) -A( 2 )|| 2 < ||g|| 2 
otherwise 



(47) 



Proo/ The first part follows trivially. The second one follows from (35) by choosing the optimal u and A( 2 ) 
or alternatively v and A( 2 ). □ 
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2.1.2 Concentration of £ (a, g, h, x) 

In this section we will show that £(cr, g, h, x) concentrates with high probability around its mean. To do 
so we will instead of looking at (45) look back at (22) which is the original definition of £(<r, g, h, x). 
Now, before proceeding further we first recall on the following incredible result from [20] related to the 
concentrations of Lipschitz functions of Gaussian random variables. 

Lemma 3 ( [20,54]). Let fu p (-) '■ R n — > R be a Lipschitz function such that \fu p {a) — fu p (b)\ < 
ciip\\ a — t> 1 1 2- Let a be a vector comprised of i.i.d. zero-mean, unit variance Gaussian random variables 
and let eu p > 0. Then 

P(|/ Iip (a) - Ef Hp ( a )\ > e lip Ef lip (*)) < exp j- ^ E M^ \ . (48 ) 



In the following lemma we will show that £(cr, g, h, x) is a Lipschitz function. To do so, we will, roughly 
speaking, assume that ||w||2 in the definition of £(cr, g, h, x) is bounded by a large constants say C w . We 
recall here that our goal in this paper, though, is much bigger than creating a "constant type" bound on ||w||2. 
Namely, we will actually establish the precise value that ||w||2 takes in the worst case with overwhelming 
probability. Clearly, knowing that one could then use much better value than C w to upper bound 1 1 w 1 1 2 in 
the definition of £(<r, g, h, x). However, for the purposes of the concentration inequalities any constant (of 
course independent of n) is fine. In fact, any sub-root dependence on n would be fine too, it is just that in 
that case "overwhelming" wouldn't be negative exponential any more. 

Lemma 4. Let g and h be m and n dimensional vectors, respectively, with i.i.d. standard normal variables 
as their components. Let a > be an arbitrary scalar. Let £(<r, g, h, x) be as in (22). Further let en p > 
be any constant. Then 

P(|£(a,g,h,x) - ££(a,g,h,x)| > %p ££(a,g,h,x)) < exp|- ^^^^ | . (49) 



Proof. We start by setting 

iMg (1) = min fVH W lli+^ 2 llg (1) ll2 + EhSO . (50) 

Further, let be the solution of the minimization in (50). Then, clearly 

/a g (1) ,h(D) = ^l|wg|||+^||g«|| 2 + Eh«(wg^ , (51) 
where (w^)j is the i-th index of wj^. In an analogous fashion set 



fu P (^\hV)= min VHl2 + - 2 Hg (2) ll^+E h f )w 0' (52) 

[w J a] 1 e5 w (cr,x,Cw) y v i=1 J 
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(2) 

and let w ; v; be the solution of the minimization in (51). Then again clearly 



/«p(g< 2 \n<'>) = Vii w Sni +° a \\* i *>h +E h S 2) ( w g)* < ( 53 ) 



i=i 



where of course (wg)j is the z-th index of wg. Now assume that /zi P (g^, hW) / fiipig^, h^ 2 )) (if 
they are equal we are trivially done). Further let fup(g^\ h^ 1 )) < fu p (g^ 2 \ h^) (the rest of the argument 
of course can trivially be flipped if / Hp (g (1) , h«) > f Hp (g( 2 \h^)). We then have 

l/%,(g (2 U (2) ) - ftpfeWhW)! = /, iP (g (2) ,h (2) ) - /u P (g«,h (1) ) 



|wg||2 + a*||g(% + £i4Vg)i - h/NSlli + ^ll^ll^ + E^^g)* 

\ i=l / \ i=l y 

< f^llwglli + ^llgWlb + E^Cwg)*) - fVllwglli + ^llg^lb + E^Kg)^ 

V i=i / V i=i / 

= Vn w £iii+- 2 (iig (2) ii 2 - iis (1) ii 2 ) + X>! 2) - ^)(wg)i 

i=i 

< vV£ii2+*W 2) -g (1) ii 2 ) + n( h(2) - h(1) ii 2 ii w £ii 2 

< V 2 ll w £lli + ^ 2 Vlls (2) " + H h(2) - h(1) Hi 



< V2C2 +<7y||g(2) - g(D|| 2 + ||h( 2 ) - h(D|| 2 , (54) 

where the first inequality follows by sub-optimality of wg in (52). Connecting beginning and end in (54) 

and combining it with (50) one then has that £(<j, g, h, x) is Lipschitz with cu v = y/2C%, + a 2 . (49) then 
easily follows by Lemma 3. □ 

One then has that ||h + z>zW — A( 2 ) ||2 and ||h + uz^ — A( 2 ) H2 concentrate as well which automatically 
implies that w also concentrates. More formally, one then has analogues to (49) 

P(\ ||h + z>z« -Xm\\ 2 -E\\h + z>z« - \& || 2 | > e^ m) E\\\v + z>z« - \&\\ 2 ) < e -4 norm) ™ 
P(||| h + Pz« - \&\\ 2 - E\\h + Pz« - AP)|| 2 | > e^° rm) ^||h + Pz« - AW|| 2 ) < e -4 n ° rm) ™ 

P(|||w|| 2 -£||w|| 2 | > eS w) E||w|| 2 ) < e-4 w) ™,(55) 



■ , (norm) n (norm) n ■ (w) „ uv-i 11 t j (norm) (norm) 

where as usual q > 0, e 2 > 0, and q > are arbitrarily small constants and ' , e\ 

and e 2 w ^ are constant dependent on e ( norm ) > o, e ^ norm ) > q, and > 0, respectively, but independent 

of n. 
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Now, we return to the probabilistic analysis of (21). Combining (21), (22), and (49) we have 



PI = P[ min ^||w||2 + a 2 ||g|| 2 + J^h iWl + h n+1 a > 

*obj 

> 1 - exp ^ - 



P(e(a,g,h,x) + h n+1 a>ci 

2(2Cl + a 2 ) J ) V eu ^ E ^ g ' h ' x ) + hn+ia ~ tobj) ■ 



(56) 



Since h n+ i is a standard normal one easily has P(h n+ ia > -el ] ^i) > 1 - e-4 h) ™ where ej h) > is an 
arbitrarily small constant and is a constant dependent on and a but independent on n. By choosing 

C% = (1 - tliv) E ^ S, h, x) - (57) 

one then from (56) has 



Pi = P \ min ( A/ ||w||| + cr 2 ||g|| 2 + V h^Wj + h n+1 cr > (1 - e Hp )E£(cr, g,h,x) - e 

yw-' o-p gs w (o-,x,c*w) y v i=1 y 

2 (x-p{- ^y })(,-^ 

As stated after (20), (58) is conceptually enough to establish a "high probability" lower bound on Co6j- The 
next few steps that formally do so are rather obvious but we include them for the completeness. Combining 
(19) and (58) we obtain 



PL T . min (/o6,(^w) + ^||w|| 2 + ^)>C^) 

[w T cr] T 6S'w(cr,X,Cw) V 



where is as in (57). Now, one further has 

P (, t ir JT) - r + \/\\™\\2 + ° 2 9) > Cob)) 

[w T cr] T e5 w ((T,X,Cw) V J 

^ p ( r rir Ti - n Po bl {^)) + VciT^g>Q%). (60) 

[w T o-] T eS w (o-,x,C w ) 

Since 5 is a standard normal one easily again has P(g^/C^ + a 2 < e{ 9) ^) > 1 - e~^ n where e[ g) > 
is an arbitrarily small constant and is a constant dependent on ef\ a, and C w but independent on n. 
Applying this to the first term on the right hand side of the above inequality one obtains 

Pi T . mm (fo bj (<r,™)) + VciT^g>(%) 

[w T o-] T e5 w (<T,x,c w ) J 

< P( min _ (/ ob >,w)) > CS " + e~^ n - (61) 
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Now let ( l lJ er = ( l obj - ef^fn. From (57) then obviously 



= (! - eu P )E^,S,h,ic) - e?>>/n. (62) 

Also let eiower be a constant such that 

1 - < (l - ^{ Jf % E ( 2C^f }) (1 " «-*'•) " <«) 

Then a combination of (11), (59), (60), (61), (62), and (63) gives 



= (! - 8, h, x) - - ef ^. (66) 



= P( min ( /o6j ( CT ,w))>cil7 er) )>(l-e-")(l-e-^ ) . (64) 

We summarize the results from this subsection in the following lemma. 

Lemma 5. Let v be an n x 1 vector of i.i.d. zero-mean variance a 2 Gaussian random variables and let 
A be an m x n matrix of i.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then ( bj be as defined in (10) and let w be the solution of (14). Assume 
P (|| w||2 < C w ) > 1 — e~ e °™ n for an arbitrarily large constant C w and a constant ec* w > dependent on 
C w but independent ofn. Then there is a constant e\ oweT > 

P{Uj > dT £r) ) > (1 " e- e '— n )(l - e-^ n ), (65) 

^(cr, g, h, x) is as defined in (22), and eu p , ef^ are all positive arbitrarily small constants. 

Proof. Follows from the previous discussion. □ 

2.2 Upper-bounding ( obj 

In this section we present a general framework for finding a "high-probability" upper bound on Cotj- To that 
end, let r and C Wup be positive scalars (in this subsection we present a general framework and take these 
scalars to be arbitrary; however to make the bound as tight sa possible in the following subsection we will 
make them take particular values). Now, if we can show that there is a w G R n such that ||x + w||i < ||x||i 
and || v — Aw\\2 < r with overwhelming probability then r can act as an upper bound on ( ob j. We then start 
by looking at the following optimization problem 

min ||x + w||i — ||x||i 

w 



\A 



w 

v a 



12 ^ 



< r 



\Mi < ci up , (67) 

where A v is as defined right after (14). If we can show that with overwhelming probability the objective 
value of the above optimization problem is negative then r will be a valid "high probability" upper-bound 
on (obj. Moreover, it will be achieved by a w for which it will hold that ||w||2 < C Wtip . 
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We now proceed in a fashion similar to the one from Subsection 2.1.1. To remove the absolute values 
we introduce auxiliary variables t j, 1 < i < n, and transform the above problem to 



mm 

w,t 



i=l 



subject to Xj + Wj — tj < 0, n — k + 1 < i < n 
— Xj — Wj — tj<0, n — k + 1 < i < n 
Wj— tj<0, 1 < i < n — k 
— Wj — tj<0, 1 < i < n — k 
w 



a 



< r 



\Mt<cl . 

II II z — w up 

We also slightly modify the first of the constraints from (67) in the following way 



(68) 



mm 

w,t,b 



||x|| 



1=1 



subject to Xj + Wj — tj < 0, n — k + 1 < i < n 
— Xj — Wj — tj < 0, n — k + 1 < i < n 
Wj — tj<0, 1 < i < n — k 
— Wj — tj < 0, 1 < i < n — k 
\\Hl<r 2 



[-Av] 



iwii? < a 



(69) 



The Lagrange dual of the above problem then becomes 



£(\ (1 \\W, i/«,7i,72, w,t,b) = J>-||*lli+ Af ) (x,+w,-t l )+ ^ A| H-^i-Wi-ti) 

i=l i=n— fc+1 i=n— fc+1 



n— A; n— fc 

LW^._*.ur \( 2 ) 



+ ^A| 1) (w l -tO + ^Af(-w,-t,)-^)^w+^)va-^)b+7 1 Eb?-r 2 )+7 2 (||wi-^ 



i=i 



i=i 



i=i 



(70) 



where is 1 x m row vector of Lagrange variables and A^, are as in previous sections. After 
rearranging terms we further have 



£(A( 1 ),A( 2 ),^), 7l ,72,w,t,b) = -||x|| 1 + ^t l (l-A; i) -Af ) )+ £ Af»(x ! + 



1=1 



i=n— fc+1 



n— k n—k 



+ ^ Af(-x J -w0 + ^A| 1 W J -^Afw l -^)^wW 1 )va-,Wb+7 1 Eb^r 2 ) +72 (||w||^C 2 

i=n— fc+1 i=l i=l i=l 
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After a few further arrangements we finally have 

n n 

£(A« A( 2 ),^), 71 , 72 , w,t,b) = J>(1 - A« - AS 2 >) + £ (A« - Af - l)x, 

i=i «=«— fc+i 

+ ((A (D - A< 2 >) r - ^U)w + A. - + 71 (£ b 2 - r 2 ) + 72 (||w|| 2 - C 2 J. (72) 

i=i 

Setting (v - Af } - \f ] ) = 0, 1 < i < n, (to insure that the dual is bounded) we have 

£(A( 2 ),^), 7l)72 ,w,b) = -2 £ Af^ 

i=n— fc+1 

+ ((z« - 2A( 2 )) T - j/M^Jw + i/Wvff - ^«b + 7l (£ b 2 - r 2 ) + 72 (||w|| 2 - C 2 J. (73) 

i=l 

Finally we can write a dual problem to (69) 

max min C(\( 2 \ 7 i, 72 , w, b) 
^'."Wmffl w ' b 

subject to < \f ) < 1, 1 < i < n 

7 1 >0, 

72 > 0, (74) 

where we of course use the fact that the strict duality obviously holds. Now, we minimize over w by setting 
the derivatives to zero 

d£(A(yw 71 , 72 ,w,b) = ((z(1) _ 2A(2))T _ vWa)T + 2 ^ (?5) 

From (75) we easily have 

((z« - 2A( 2 )) T - u^Af 
w = ^ -L '-. (76) 

2 7 2 

Plugging (75) back in (73) we further have 

n 

£(A(VW 7l , 72 ,b) = -2 £ \ (2) *< 

i=n— fc+1 

_ U^.^-^Ah + _ „ Wb + 7l( £ b? _ r2) _ ^ (77) 

4 72 ^ 

Now, we minimize C(X^\ \( 2 \ 7 i, 72 , b) over b by setting the derivatives to zero 



db 

From (78) we easily have 



= -z/W + 2 7 iw. (78) 



b = — . (79) 

2 7 i 
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Plugging (79) back in (77) we have 

m <2) (i) x x< 2 W ||(z( 1 )-2A( 2 )) r -^U|| 2 (1) ||„W|| 2 2 „ 2 

i =n -k+l 72 71 

(80) 



and finally an equivalent to (69) 



max 

, ^ (1) ,7i)72) 

A( 2 ), uW, 7i ,72 

subject to < Af ) < 1, 1 < i < n 

71 > 0, 

72 > 0. (81) 
After doing the trivial maximization over 71 and 7 2 one obtains 

n 

-2 V Af ) x l -C Wup ||(z( 1 )-2A( 2 )) T -^ 1 )^|| 2 + ^ 1 W-||^ 1 )|| 2 r 

A(2) '^ (1) i=n-fc+l 

subject to < Af ) < 1, 1 < i < n. (82) 
We rewrite (82) in a slightly more convenient form 

n 

- mm max ((z^ - 2A (2) ) T - u^A)a - u^Ka + lli/W || 2 r + 2 V A, (2) 5q 

AP),.W ||a|| a =C Wup . ^-J 

«=n— fc+1 

subject to < Af ) < 1, 1 < i < n. (83) 
Now let us define as 

n 

-& = - min max ((z« - 2A^f - u^A)a - v^va + \\u^ \\ 2 r + 2 V Af } x, 

A(2),,(D ||a|| 2 =C Wup 

subject to < Af } < 1, 1 < i < n. (84) 

Any r such that lim n _> P{f^ > 0) = 1 is then a valid "high-probability" upper bound. 

We now introduce a refinement of a lemma from [62] which itself is a slightly modified Lemma 1 
(Lemma 1 is of course the backbone of the escape through a mesh theorem utilized in [63]). 

Lemma 6. Let A be an m x n matrix with i.i.d. standard normal components. Let g and h be m x 1 and 
(n + 1) X 1 vectors, respectively, with i.i.d. standard normal components. Also, let g be a standard normal 

random variable and let Abe a set such that A = (A^jO < Af ^ < 1, 1 < i < n). Then 



P{ min max (— \A vl 

X(2) eA ,uWeR n \0\\ah=C Wup L J 



+ ll^ (1) ||25-V'a,A(2),,(l))>0) 



n m 



i=l i=l 

(85) 
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Let 

n 

V' a ,A( 2 ),,( 1 )=4 9) ^ll- (1) ll 2 -a T (z«-2A( 2 ))-||,( 1 )|| 2 r-2 £ Afx„ (86) 

i=n— k+1 

with > being an arbitrarily small constant independent of n. The left-hand side of the inequality in 
(85) is then the following probability of interest 

n m 



" p i=l i=l 



n 

-ei a) M\v (l) h + * r {z (l) -2\^) + \\^%r + 2 £ a£ 2) x,) > 0). 

i=n— k+1 



After solving the inner maximization over a and pulling out ||i/||2 one has 
Pu = P( min (C Wu J|h + — J— {z^ -2\^)\\ 2 + h n+l a 



m (1) n .(2) 

+ a/^ + ^ 2 E » - £ 3 9) ^ + - + 2 E tm^)^°)- 



i=l 11 lu i=n-fe+l 11 l|z 

After minimization of the second term over a unit norm vector we further have 



p u = P{ min (C w J|h+— \— (z^ -2\W)\\ 2 + h n+1 a 



v A( 2 )eA,^e-R n \o' """" 



A (2) 



^Cl up + ^\\ g \\-2-efV^ + r + 2 £ ^ > 0). (87) 



n-fc+1 11 U 



Now we change variables so that v = jr^ryrr ana A^ 2 ) = jj^rrjr an d redefine A by setting 

A (2) = | A (2) G ^njQ < A (2) < 1 < i < n }. (88) 
We also recall that remains as defined right after (36). Plugging all of this back in (87) gives us 



p u = P(r+h n+1< 7-e? ) Vn- ( max (Jc% up + a 2 \\ S \\ 2 -C w J\h+^-X^)h- £ A? ^ > 

— i=n— k+1 

(89) 

Now, let 



M<x,g,h,x,C Wup ) = max {Jc^ + **\\g\\ 2 - C w J\h + vz^ - \^)\\ 2 - £ Afx,). 

— i=n—k+l 

(90) 

In the following lemma we will show that £ up (cr, g, h, x, C Wup ) is a Lipschitz function. 

Lemma 7. Let g and h be m and n dimensional vectors, respectively, with i.i.d. standard normal variables 
as their components. Let a > be an arbitrary scalar. Let £ U p(cx, g, h, x, C Wup ) be as in (90). Further let 
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eup > be any constant. Then 



(a,g,h,it,C Wup )-Et up (a,g,h,XL,C Wup )\ > e Hp E^ up (a,g,h,it,C Wup )) < exp | - ^^^^§ ' jp * 2 ' ' } 

(91) 

Proof. The proof will be similar to the corresponding one from Subsection 2.1.2. We start by setting 



/Mg (1) ,h«)= max (^ + ^||g«|| 2 -C Wll J|h( 1 )+,z( 1 )-A( 2 ))|| 2 - £ Afx,). 

— i=n— «+l 

Further, let z/^P 1 ) and A( hpi ) be the solutions of the minimization in (92). Then, clearly 



(2) i 

(92; 



n 



^(K^.IeW) = H-^|| K W|| 2 -C^^||hC 1 >H-^^> aB < 1 >-A^))|| 2 - £ Af^x,). (93) 

i=n— k+l 

In an analogous fashion set 



^p(g< 2 \h< 2 >) = max (JC^T^||g( 2 )|| 2 -C7 Wll J|h( 2 ) + ,z( 1 )-A( 2 ))|| 2 - £ Afx,), 

A(2)gA( 2 ),^>0 V ' 

— i=n— K+l 

(94) 

and let v^ lvp '^ and \^ lvp '^ be the solutions of the minimization in (94). Then, clearly 

fuM 2) , h (2) ) = (y/cl up + ^ 2 ||g (2) || 2 - C Wup ||h( 2 ) + ,(HW - AC*»>) || 2 - £ Af P2) x,), (95) 

i=n— k+l 

Now assume that fu p {g^\ h^) / fu P {g^ 2 \ h^) (if they are equal we are trivially done). Further let 
f lip (g^,hW) < / /ip (g( 2 ),h( 2 )) (the rest of the argument of course can trivially be nipped if fu p {g^ , h^ 1 ) ) > 
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/ Hp (g (2) ,h( 2 >)). We then have 

|/i*(g (2 U (2) ) - fofeWhW)! = ftp(g< a >,h< 2 >) - ^(gW.hW) 

= + a 2 ||g( 2 )|| 2 - C Wup ||h( 2 ) + i/^-W - A^)|| 2 - £ Af^x.) 



i=n— k+1 
n 



- (^ci up + a 2 ||g«|| 2 - c w j|h« + ^) z « - a(^)|| 2 - £ A * ^) 

i=n— k+1 



< (V C w„ p + ^ 2 llg (2) ll2 - C w J|h( 2 ) + ^0 Z (D _ A («pO|| 2 _ J2 Af Pl) x, 

i=n— k+1 

n 

(%>i)~ 



~ ^Cl up + a 2 ||g«|| 2 ~ CWJ|h W + - A^)|| 2 - £ Af^x 

i=n— k+1 

= V^+^dlg (2) II 2 - II 2 ) - ^ (l|h (2) + ^ Pl) z« - A^Ha-llhW + ^zW-A^Ha) 

< ^ P + ^ 2 l|g (2) - g (1) l| 2 + C Wlip (||h( 2 ) - h«|| 2 ) 

< y/2Cl up + <T*y/\\gW ~ gWHl + (||h( 2 ) - h(D|| 2 ), (96) 

where the first inequality follows by sub-optimality of i/ hpi and \( hpi } in (94). Connecting beginning and 
end in (96) and combining it with (92) one then has that £ up (cr, g, h, x, C Wup ) is Lipschitz with cu p = 
a/2C^ + a 2 . (91) then easily follows by Lemma 3. □ 



We continue by following the line of arguments right after (56). As stated there P(h n+ ia 

-e (h) n u (h) 

- e 2 where > 
but independent on n. Set 



1 — e e 2 )fl where > is an arbitrarily small constant and is a constant dependent on and a 



r = CS = C 1 + ziipMi*, S, h, x, C Wup ) + e< h) ^ + ejf^- (97) 



One then has after combing (89) and Lemma 7 



p u = P(r+h n+1 a-ei 9) V^- max (a/^ + ^ 2 l|g||2-C Wtl J|h W 1} -A^) || 2 - £ Afx,) 

— i=n— k+1 

>P((l + e Kj ,)^( < 7,g,h,x,C w „ J> ) 



■up / 

n 



^ m f 2 ? (V^+^llglb-Cw^llh + ^-A^)!!,- £ Af Xi ))(l-e- 2 ») 
A(2) eA (2),,> V i= ^ +i 

,(.-^{-<^\y})(l-e^,. .8) 

As stated after (20), (98) is conceptually enough to establish a "high probability" upper bound on Co6j- What 
is left is to connect it with (84). Combining (98), (85), and (84) we then obtain 



(eKp- E£(g,g,h,x,C W up )) 2 |\ lA _ e ( h ) nwi _ e (») r 
2(2^ + a 2 ) 



P(/ ( :/ } > 0) > 1 - exp { - ^^'"'r'lr^' e- £r ' n )(l - e"*'"), (99) 
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where we used the fact that g is the standard normal and therefore P(g — y/n < 0) > (1 — e 4 9)n ) for 
an arbitrarily small > and a constant ef^ dependent on but independent of n. 

We are now in position to summarize results from this subsection in the following lemma which is 
essentially an "upper-bound" analogue to Lemma 5. 

Lemma 8. Let v be an n x 1 vector of i.i.d. zero-mean variance a 1 Gaussian random variables and let 
A be an m x n matrix of i.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then Cobj be as defined in (10) and let w be the solution of (14). There is a 
constant e U pper > 

P(( obj < C£T r) ) > 1 - e~ e — (100) 

where 

c („) = (1 + eiip)ECup{ ^ & h) Sj Cw J + £ (h)^ + e (9)^ (101) 

£,up(o~, g, h, x, C Witp ) is as defined in (90), en p , e^ h \ are all positive arbitrarily small constants, and 
C Wup is a constant such that ||w||2 < C Wup . 

Proof. Follows from the previous discussion. □ 



2.3 Matching upper and lower bounds 

In this section we specialize the general bounds introduced above and show how they can match each other. 
We will divide presentation in three subsections. In the first of the subsections we will make a connection to 
the noiseless case and show how one can then remove the constraint from (45), (46), and (47). In the second 
subsection we will consider a w such that |||w||2 — || w" || 2 1 > £w up || w||2- We will then quantify how much 
the lower bound that can be computed for such a w through the framework presented in Section 2. 1 deviates 
from the optimal one obtained for w. In the last subsection we will then show that there will be a w such 
that the upper bound computed through the framework presented in Section 2.2 will deviate less. That will 
in essence establish that upper and lower bounds computed in the previous sections indeed match. We will 
then draw conclusions as for the consequences which such a matching of the bounds leaves on a couple of 
LASSO parameters. 



2.3.1 Connection to the l\ optimization 

In this subsection we establish a connection between the constraint in (45), (46), and (47) and the funda- 
mental performance characterization of £\ optimization derived in [62] (and of course earlier in the context 
of neighborly polytopes in [25]). We first recall on the condition from Lemma 2. The condition states 



yi + ^-||h + z>z« -A(2)|| 2 < ||g|| 2 , (102) 

where C w is an arbitrarily large constant and v and A( 2 ) are the solutions of 

/ n 

max axJWgg- Hh + i/zC 1 ) -A( 2 )|||- £ A^x,; 



1 

i=n—k+l 



subject to < Xf } < 2u, 1 < i < n 



u>0. (103) 
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Now we note the following equivalent to (103) for the case when nonzero components of x are infinite 



max cry ||g||2 — ||h + ^z( x ) — A( 2 ) HI 
subject to < \f ) < 2v, 1 < i < n - k 
Aj 2) = 0, n - k + l <i <n 

u>0. (104) 
Now, to make the new observations easily comparable to the corresponding ones from [61, 63] we set 

h = |h|(2)' ■ • • ' IH(n-jfcj'kn-fc+l>hn-fc+2> • • • >h„] T , (105) 

kh 

(1)' l 11 l(2)' • • • ' l 11 l(n-fc)J 



where [|h||^, |h|^, • • • , |h|[™_fcj] are magnitudes of [hi, h 2 , . . . , h n _fc] sorted in increasing order (possible 
ties in the sorting process are of course broken arbitrarily). Also we let z^ be such that zf 1 = — zf~\ n — 
k + 1 < i < n and z = , 1 < i < n — k. It is then relatively easy to see that the above optimization 
problem is equivalent to 



max 



ffVllslll " ||h-^z( 2 ) +A( 2 )||2 



(2) 

subject to < A ■ < v, 1 < i < n — k 

Aj 2) = 0, n - k + l <i <n 

v>0. (106) 

Let vi x and A^ 1 ^ be the solution of the above maximization. Then, as we showed in [63] and [62], the 
inequality 

£||g|| 2 > £||h - u £l z^ + || 2 (107) 

establishes the following fundamental performance characterization of the l\ optimization algorithm from 
(2) that could be used instead of LASSO to recover x in (1) (which is a noiseless version of (3)) 

t /I e -(erflnv(i5^)) 2 



(1-(3 W )^JL V2erflnv(i-^) = 0, (108) 

OL w 1 )J W 

where of course a w = ^ and /3 W = |. As it is also shown in [63] and [62] both of the quantities under the 
expected values in (107) nicely concentrate. Then with overwhelming probability one has that for any pair 
(a, 0) that satisfies (or lies below) the above fundamental performance characterization of t\ optimization 

||g|| 2 > ||h-^ lZ ( 2 ) +A^)|| 2 . (109) 

(2) 

Moreover, since A^ > 0, n — k + l < i < n, in (103) one actually has that (109) implies that with 
overwhelming probability 

||g|| 2 > ||h + z> Z W - A(2)|| 2 , (110) 

which for sufficiently large C w is the same as (102). We then in what follows assume that pair (a, (3) is such 
that it satisfies the fundamental t\ optimization performance characterization (or is in the region below it) 
and therefore proceed by ignoring the condition (102). (Strictly speaking, all our overwhelming probabilities 
below should be multiplied with an overwhelming probability that (108) holds; to maintain writing easier 
we will skip this detail.) 
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2.3.2 Deviation from the lower-bound 



In this subsection we show that ||w; asso ||2 can not deviate substantially from ||w||2 without substantially 
affecting the value of the lower bound on the objective in (6) that is derived in Section 2.1. To that end let 
us assume that there is a ~w ff that is the solution of the LASSO from (6) (or to be slightly more precise 
that is such that x = x + w yy, where obviously x is the solution of (6)). Further, let 1 1 1 w / / 1 1 2 — ||w|| 2 | > 
e Wup ||w||2, where e Wup is an arbitrarily small constant. 

One can then proceed by repeating the same line of thought as in Section 2.1. The only difference will 
be that now C w = || w G / / H2 and consequently in the definition of S w (a, x, C w ), 1 1 w 1 1 2 < C w changes to 
||w||2 = C w = 1 1 w / / 1 1 2 - This difference will of course not affect the concept presented in Section 2.1. The 
only real consequence will be the change of (22). Adapted to the new scenario (22) becomes 



n 

£ o// (<7,g,h,x,w o// ) = mm ^ ||w o// ||| + o- 2 ||g|| 2 + ^h;w, 

u ' i=i 



subject to y~]tj < ||x||i 

i=l 

Xj+Wj— tj<0,n — k + 1 < i < n 
— Xj — Wj — tj<0, n — k + 1 < i < n 
Wj — tj<0, 1 < i < n — k 
— Wj — tj<0, 1 < i < n — k 



^||w||i+a2< ^ W 2 //+CT 2. (HI) 

One can then proceed further with solving the Lagrangian to obtain 

n 

£ //(<7,g,h,x,w o// ) = max (\/ w o// + ^llslb " w o/ /||h + vx™ - A (2) )|| 2 - Yl ^x*)- 

— i=n— k+1 

(112) 

Using the probabilistic arguments from Section 2. 1 one then from Lemma 5 has that if w D j j is the solution of 
(6) then its objective value with overwhelming probability is lower bounded by (l—eu p )E^ ff(a, g, h, x, w ff) 
(£ //(cr, g, h, x, w G /j) is structurally the same as £ up (<T, g, h, x, C Wup ) from (90) and therefore easily con- 
centrates based on Lemma 7). We will now consider in parallel the following lower bound from (42) (clearly, 
choosing w j/ = w would make (112) equivalent to (42)). 



U(<7,g,h,x)= max aJ||g||!-||h + i/z(i)-A( 2 )||!- V A 2) x,. (113) 

i/>0,A( 2 )eA( 2 ) v ■ 



Now, let as usual v and A( 2 ) be the solutions of (113). Let 



n 



&eJ P (<T,g,h,x,w o// ) = ^w 2 0ff + a 2 \\g\\ 2 - w o// ||h + z>z (1) - A( 2 )|| 2 - X i **• ( 114 ) 

i=n— k+1 
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Then 



£o//(o-,g,h,x,w //) - £ ot ,(o-,g,h,x) > £ he i p (a,g,h,x,w off ) - £ „(cr, g, h, x) 

= ^w2 // + (J 2 ||g|| 2 - w o// ||h + £z« - A(2)|| 2 - a^||g||2-||h + z>z(i)-A(2)||2. (115) 

For the simplicity let |||w //||2 — ||w||2| = ew up ||w||2 (this restriction is clearly more conservative than 
I || w o//||2 — || w || 2 1 > e w „ p || w || 2). Now, we switch to expectations and ignore all e except e Wjip . Since every 
quantity that we will consider (see (55)) concentrates e's in concentration inequalities can be made arbitrarily 
close to zero; moreover once e Witp is fixed all other e's can be made arbitrarily small compared to e Wup . Also, 
we will show derivation for w G jj = (1 + e Wup ) 1 1 w 1 1 2 (the derivation for the case w G jj = (1 — e Wup ) 1 1 w 1 1 2 
is completely analogous). 

Now, to facilitate writing we then set all e's except e Wup to zero. We then have 

E £off{<?, g, h, x, w o// ) - E^ ov (a, g, h, x) > E( hdp (a, g, h, x, w o// ) - E£ ov (a, g, h, x) 

= 01 + ew up ) 2 (£||w|| 2 )2 + ^E\\g\\ 2 - (1 + e Wui ,)£||w|| 2 £||h + z>z« - \&\\ 2 

- <j\J (£||g|| 2 ) 2 - (E\\h + z)z(D - W) || 2 ) 2 (116) 

where = means that equality is not exact but for a fixed e Wup can be made as close to it as needed. In a 
similar fashion we have 

fii -11 aE\\h + uzW -\^\\ 2 

E w 2 = ; (117) 



\E\\g\\ 2 )2-(E\\h + »zW-\W\\ 2 y 

Before we proceed further we simplify the notation with the following change of variables. 

g E = E\\g\\ 2 

h E = E\\h + uz^ - A(2)|| 2 



He = ay/ {E\\g\\ 2 ) 2 - (E\\h + z>zW - A^)|| 2 ) 2 = <Ty/g 2 E - h 
crh E 



we = E\\w\\ 2 = ; = :■ (118) 

2 

l E 



From (118) one easily has 



f 2 

V E = 9 E -f 2 
WE = (119) 
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Then a combination of (116), (118), and (119) gives 



/ h 2 a A 

E£ he i p (a,g,h,±,w off ) - £f OT (<T,g,h,x) = J (1 + e Wup ) 2 (-^— ) + <? 2 9e - (1 + e Wup )w E h E - £e 



(1 + ew up J^ — i/l- — — s 2 2 2 ~ i 1 + e w up )^ — + ew up 4£- (120) 

V (1 + e -Wup) 9E a & 



Now, assuming that e w is small (and recognizing that £g < g_E<7) from (120) we have 



{I -+■ £\v up ) t \ 1 \2 2 2 ' e Wu P ) e + e w„ p ?E 



.,2 „2 t2 (o, , ,2 \ 2 2 



~ (1 + Ew„ p J-T (1 — , Tg"^ — o / _ ^ + e w up J— 7 1" e w up C,E 

tE{2e Wup + ej up ) = 2e Witp g g (l + e Wup ) - gg(2e w ^ + ej up ) _ frg<4 up 

2(l + e Wup ) +£w ^ 2(l + ewup ) 2(l + 6 Wup )- ( j 

Combining (115), (120), and (121) we finally have 

e 2 e 2 

££ o// (<T,g,h,x,w o// ) - £U(a,g,h,x) > — - E£ E > ^ fl^fog.h.x) (122) 

where the last inequality follows by noting that in the definition of ^ ov (a, g, h, x) the elements of x and A^ 2 ) 
are non-negative. 

Now, roughly speaking, (122) shows that if ||w7 asso ||2 were to deviate from 1 1 w 1 1 2 the optimal value 
of the objective in (6) would be higher than the lower bound derived in Section 2.1. We summarize these 
observations in the following lemma (essentially a deviating equivalent of Lemma 5 from Section 2.1). 

Lemma 9. Let v be an n x 1 vector of i.i.d. zero-mean variance a 2 Gaussian random variables and let 
A be an m x n matrix of i.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then be as defined in (10) or (14) and let w ff b e me solution of (14). 
Let a and f3 be below the fundamental characterization (108) and let w be as defined in (46). Assume that 
||| w o//||2 — 1 1 1 1 2 1 > Cwupllw^, where e Wup is ® n arbitrarily small but fixed constant. Then there would 
be a constant e Q ff > 0, and arbitrarily small positive constants eu p , such that 

P(Cob 3 >C$ f) )>l-e- e °" n , (123) 

where 

CSf } = (1 - + J £U(g,g,h,x) - 4 h) ^- e^v^, (124) 

2(1 + e Wup ) 

and ^ „(cr, g, h, x) is as defined in (42) (or (113)). 

Proof. Follows from the previous discussion, discussion from Section 2.3.1, and a combination of (112), 
(1 15), (122), arguments right after (112), and Lemma 5. □ 
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2.3.3 Deviation of the upper bound 

In this section we will show that ||wj asso ||2 can not deviate from ||w||2 as much as it was assumed in the 
previous section. To do so we will actually continue to assume that it can and then eventually reach a 
contradiction. As in previous section, let then 1 1 1 w f / 1 1 2 — 1 1 w 1 1 2 1 > e Wup 1 1 w 1 1 2, where e Wup is an arbitrarily 
small constant. Further, let £duai{°', g 5 h, x) be 

n 

€dual{v, g, h, x) = min max \J d 2 + cr 2 ||g|| 2 -d||h + ^z (1) - A (2) || 2 - V] A^x; 

subject to v > 

< Xf ) < 2v, 1 < i < n. (125) 

Rewriting (125) with a simple sign flipping turns out to be useful in what follows 

n 

-Zduai(<T,S,h,x) = max min - Vd 2 + a 2 ||g|| 2 + d||h + vz^ - A< 2 > || 2 + V aJ 2) x; 

^° ^' A(2) i=n-k+l 

subject to v > 

< Af ) < 2i/, 1 < i < n. (126) 

The following lemma provides a powerful tool to deal with (126). 
Lemma 10. Let £duai(o~, g, h, x) be as defined in (126). Further, let 

n 

-£ OT (<7,g,h,x) = minmax - ^ d 2 + <7 2 ||g|| 2 + d||h + uz^ _ A (2) y + A ( 2 )~. 

^ A(2) i=„- fc+ i 
subject to v > 

< Af ) < 2i/, 1 < i < n. (127) 

77ze« 

Uualio; g,h,x) = ^(a,g,h,x). (128) 



Proof. After solving the inner maximization over <i in (127) one has 

. ||h + ^)-A( 2 )|| 2 

d op t = cr (129) 
^Uglll-llh + z/zW-A^II 2 

Such a d then establishes that the right-hand side of (127) is indeed £ 01 ,(<t, g, h, x), i.e, one has as in (42) 

n 



-&>,g,h,x)= min - fTV /|| g || 2 _||h + ^z(i)-A( 2 )|| 2 + V Af ] x, 
subject to z> > 



< A, (2) < 2i/, 1 < i < n. (130) 
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Now we digress for a moment and consider the following optimization problem 

n 

min -<rqi+ Yl A^x; 

^ A() ' qi ' q2 i=n-fc+l 
subject to ||h+ z/z (1) - A (2) || 2 < q2 

2 , 2 ^ II i|2 

qi +q 2 < ||g|| 2 
v > 

< Aj 2) < 2i/,l < i < n. (131) 

Let — (c, g, h, x) be the optimal value of its objective function. Let quadruplet z>, A( 2 ), q\, q2 be the 
solution of the above optimization problem. Then it must be 

||h + z>z (1) - A(2)|| 2 = q 2 (132) 

and consequently 

qi = VTlsili - Hh + ^zW — AC2)||1 



-£«(<x,g,h,x) = -aVl|g|||-||h + z>z(i)-A(2)||2+ £ Af } Xl . (133) 

i=n— fc+1 

The above claim is rather obvious but for the completeness we sketch the argument that supports it. Assume 

that 1 1 In + z>zW - A( 2 )|| 2 < q~2, then qi < \J |jgjj| - ||h + OzW - and (cr, g, h) would be 

smaller then the expression on the right-hand side of (133). Now, since (132) and (133) hold one has that 
— £av (c, g, h, x) can be determined through the following equivalent to (131) 



-^V,g,h,x) = min -a^||g|||-||h + I ,z«-A( 2 )||l+ £ 

i=n— fc+1 

subject to v > 



< \f ) < 2i/, 1 < i < n (134) 

After comparing (130) and (134) we have 

-£^V>g; h ;X) = -Cok(o-, g,h,x). (135) 

Now, let us write the Lagrange dual of the optimization problem in (131). Let d and 71 be Lagrangian 
variables such that 

n 

max min -<7<u + V Af } x; + d||h + ^z« - A (2) || 2 - dq 2 + 71 (Qi + qi) - 7i||g||i 

d>0, 7 i>0,,A(2), qi , q2 i=? ^ +l 

subject to z/ > 

< Af } < 2i/, 1 < i < n. (136) 
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After solving the inner minimization over qi, q2 in (136) we have 

2 j2 n 

max min _^L_ 7l || g ||2 + v xPsc i + d\\h + ^-x^\\ 2 

subject to z/ > 

< Af ) < 2v, 1 < i < n. (137) 

Since the first two terms in the objective function in (137) do not involve neither v nor A(2) one can then 
maximize their sum over 71 for any d. After that we finally have 



max min - \/V + d 2 ||g|| 2 + V A, (2) 5q + dllh + uz (1) - A (2) || 2 
subject to v > 



< \f ] < 2u, 1 < i < n. (138) 

Let —^m) (c, g, h, x) be the optimal value of the objective function in (138). Since (138) is the dual of (131) 
and since the strict duality obviously holds (the optimization problem in (131) is clearly convex) one has 

-£$(<7,g,h,x) = -£«(<r,g,h,x). (139) 
On the other hand the optimization problem in (138) is the same as the one in (126) and therefore 

-^ } (^g>h,x) = -Sd«aj(<7,g,h,x). (140) 
Connecting (135), (139), and (140) one finally has 

-£dual(cr,g,h,x) = -£ OT (<7,g,h,x) (141) 

which is what is stated in (128). This concludes the proof. □ 

Let d, z>, A( 2 ) be the solution of (125). Clearly, d = ||wl| 2 = a -, ^ h+uz( } ~ A( Hk= — and since all 

_ VllsllMl h + i>z(1) - A(2) ll2 

quantities concentrate Ed = E\\vr\\2 = cr , E W h + uz( - ) ~ A( Hlg== , Now, set C w = £7||w||2 in (90). 

V £ llsl!i- s ll h+!>z(1) - A(2) lli 
Then a combination of (90), (125), and Lemma 10 gives 

n 

E^ up {a,g,h,St,E\\w\\ 2 ) = E max (v^F + ff 2 ||g||2--E||w|| 2 ||hW 1) -A( 2 ))|| 2 - V A^x, 

A(2) e A( 2 ),i/>0 . 

— i=n— k+1 



E max (J{Ed) 2 + a 2 \\g\\ 2 -Ed\\h + uz^ -X^)\\ 2 - V Af } Xi ) 

— i=n— k+1 

n 



= £min max (V^ 2 + <7 2 ||g|| 2 - d||h + vt^ - A^)|| 2 - V A 2) x 4 ) = E^a, g, h, x). 

d>0 A( 2 )eA( 2 V>0 . 

— t=n— k+1 

(142) 

Combining Lemma 14 and (142) one has that with overwhelming probability there is a w such that the 
objective in (6) is upper bounded by a quantity arbitrarily close from above to E^ ov (a, g, h, x). On the 
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other hand Lemma 5 states that for any w such that |||w||2 — 1 1 "W" 1 1 2 1 1 > e u) up ||w||2, e Wup > 0, the objective 
value of (6) is with overwhelming probability lower bounded by a quantity that is arbitrarily close from 

4 

below to (1 + 9n _ri" p — ^)E^ ov (a, g, h,x). Clearly then the assumption of Lemma 5 is unsustainable and 

one has that ||w; asso ||2 can not deviate substantially from ||w||2- This then implies that with overwhelming 
probability the objective value of (6) concentrates around E£ ov (a, g, h, x) and consequently that ||w/ asso || 2 
concentrates around £7||w||2. 

2.4 Connecting all pieces 

In this section we connect all of the above. We will summarize the results obtained so far in the following 
theorem. 

Theorem 1. Let v be an n x 1 vector of i.i.d. zero-mean variance a 2 Gaussian random variables and 
let A be an m x n matrix of i.i.d. standard normal random variables. Further, let g and h be m x 1 
and n x 1 vectors of i.i.d. standard normals, respectively. Consider a k-sparse x defined in (7) and a y 
defined in (3) for x = x. Let the solution of (6) be x and let the so-called error vector of LASSO from 
(6) be Wi asso = x — x. Let n be large and let constants a = ^ and j3 = ^ be below the fundamental 
characterization (108). Consider the following optimization problem: 



£ OT (cr,g,h,x) = max a J\\g\\l - ||h + z/zM - A( 2 ) \\% - V A 4 (2) x 

^ A(2) l= n~k+l 

subject to v > 



< A, (2) <2u,l<i<n. (143) 



Let v and A( 2 ) be the solution of (143). Set 



, -„ Hh + ^zW -A( 2 )|| 2 

w 2 = t ' (144) 



|g|| 2 - Hh + z/zM -A( 2 )|| 2 
Then: 

P ((l _ e (' OM °))^ <w ( <T ,g,h,x) < ||y - Ax|| 2 < (1 + ef asso) )£U(a,g,h,x)) = 1 - e -4 iasso) « (l 45 ) 
and 

P(( l _ e ^ ))P||w||2 < ||w, asso || 2 < (1 + ^ asso) )E\\^\\ 2 ) = 1 - e -4 !ass °\ (146) 

where e ^ asso ^ > is an arbitrarily small constant and e ^ asso ^ i s a constant dependent on e ^ asso ^ an d a jy U t 
independent of n. 

Proof. Follows from the above discussion and a combination of (42), Lemma 2, discussion in Section 2.3.1, 
and Lemmas 5, 14, and 10. □ 

It may not be clear immediately but the result presented in the above theorem is incredibly powerful. 
Among other things, it enables one to precisely estimate the norm of the error vector in "noisy" under- 
determined systems of linear equations. Moreover, it can do so for any given /c-sparse vector x. Furthermore, 
all of it is done through a transformation of the original LASSO from (6) to a much simpler optimization 
program (143). While many quantities of interest in LASSO recovery can be computed through the mecha- 
nism presented above, below we focus only on quantities that relate to what we will call LASSO's generic 
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performance. Computation of all other quantities that we consider are of interest will be presented in a series 
of forthcoming papers. 



2.4.1 LASSO's generic performance 

The results presented in the above theorem are fairly general and pertain to pretty much any possible scenario 
one can imagine. Here we will focus on the so-called "worst-case" scenario or as we will refer to it "generic" 
performance scenario. We will now show that £7||w||2 from Theorem 1 can be upper-bounded over the set 
of all x's. To that end let us assume that all nonzero components of x are infinite. The optimization problem 
from (143) then becomes 



dr b W,h) = max a J Ml - ||h + i/zW - A(') \\\ 

subject to v > 

< \f } = 0, n - fc + 1 <i <n 

< \f ] <2v,\<i<n-k. (147) 

Let v gen and \^ en ) be the solution of (147) and let w gen be the error vector in case when all nonzero 
compoenents of x are infinite (in Section 2.3.1 for a slightly changed version of (147) u gen and \^ en ) we re 
referred to as and X^). Now, let us assume that some of nonzero components of x in (143) are finite. 

And let as usual v and A( 2 ) be the solution of (143) and let ||w||2 be the norm of the LASSO's error vector. 
Since 



a^llglll-llh + ^zW-A^III- Af } x, > ay/Ml - ||h + u gen zW - A^ojji (148) 

i=n— fc+1 

(2) 

and A, ■ > 0, n — k + 1 < i < n, one has that 

Hh + ^zW -A(2)|| 2 < \\h + » gen zW -\^\\ 2 . (149) 
Furthermore, one then has for the norm of error vectors 

[[h + ^zW-A^lb \\h + v gen zW -\^\\ 2 

|w|| 2 = a / < a = ||w flen || 2 . (150) 

|g||2 - ||h + £>z(!) - A( 2 )||l yllglli - Hh + ^zW - A(9 e ™)||| 

Then the following generic equivalent to Theorem 1 can be established. 

Theorem 2. Assume the setup of Theorem 1. Consider the following optimization problem: 

£(f"V, S, h) = min ||h + vtV _ A (2) || 2 

subject to v > 

Aj 2) = 0, n - fc + 1 <i <n 

< \f ) < 2v, 1 < i < n - k. (151) 



32 



Let v gen and \^ gen ^ be the solution of (151). Set 



w gen || 2 = a / (152) 
"g||| - Hh + ^zW - A(^)||2 



Then: 



P(3w iasso |||w /asso || 2 G ((1 - ef° sso) )E||w gen || 2 , (1 + ef asso) )^||w 9en || 2 )) > 1 - e"^^ 



P(||w Zosso || 2 <(l + er S ° ; )^l|w ffe n|| 2 )) > 

(153) 

where e ^ asso ^ > o aw arbitrarily small constant and e ^ asso ^ an d e (j asso ) are cons t an t s dependent on 
^(lasso) ^ but independent of n. 

Proof. Follows from the above discussion, Theorem 1, and by noting that the optimization problems in 
(151) and (147) are equivalent. □ 

The following corollary then provides a quick way of computing the concentrating point of the "worst 
case" norm of the error vector. 

Corollary 1. Assume the setup of Theorems 1 and 2. Let a = ^ and j3 w = K Then 



/-, mi |i ^ (lasso)-. / OL w , (lasso)-, / Oi w 

P(pWi asso \\\w lasso 2 G ((1 - e\ >)o-J , (1 + ef 

\ a — a w \ a — a 



(lasso) 

)) > l-e" £ 2 r 

w 



run n ^ <a , (lasso)-, / OL w _A lass °) n 

P(\\^lassoh < {I + e\ >)aJ ) > 1-e £ 2 

y a — a w 

(154) 



where a M < a is such that 



(1 - p w )^ ^2 erfinv ( ^L) = (155) 

and e ^ asso ^ > o is an arbitrarily small constant and e ^ asso ^ i s a constant dependent on e ^ asso ^ an d a but 
independent of n. 

Proof. Let h and be as in Section 2.3.1. Then 

£<,f n V,g,h) = min + \W\\ 2 

subject to v > 

Aj 2) =0,n-k + l<i<n 

< \f } < v, 1 < i < n - k, (156) 
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is equivalent to (154). Moreover 



E\\w 



gen\\2 



a 



a 



^iigiii-^rWh) 



(X,, 



a — a qi 



(157) 



where a w m 
from (153). 



E^o V en \a, g, h) 2 is one of the main contributions of [63]. The rest then trivially follows 

□ 



Using (155) and (147) one can then for any a and any pair (a, fi w ) (that is below fundamental charac- 
terization (108)) determine the value of the worst case -E||w; asso ||2 as a \ 
results in Figure 3. For several fixed values of worst case E\\ 



We present the obtained 



a—a w 

iasso\\2 we determine curves of points (a, (3 W ) 
for which these fixed values are achieved (of course for any a that is below a curve the value of the corre- 
sponding worst case E\\ w; asso || 2 is smaller). As can be seen from the plots the lower the norm-2 of the error 
vector the smaller the allowable region for pairs (a, (3 W ). 

The results of the above corollary match those obtained in [7,26] through a state evolution/bilief propa- 
gation type of analysis. The above corollary relates to the LASSO from (6) whereas the results from [7,26] 
are derived for somewhat different LASSO from (5). However, as mentioned earlier, in Section 4 we will 
establish a nice connection between the LASSO from (6) and one that is fairly similar to (5). 



(cc,B ) curves as functions of p=| |w, IL/o, LASSO 

1 r w' r " lasso 2 




Figure 3: (a, f3 w ) curves as functions of p 



£||w ' ass ° 112 for LASSO algorithm from (6) 
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3 LASSO 's performance analysis framework - signed x 



In this section we show how the LASSO's performance analysis framework developed in the previous sec- 
tion can be specialized to the case when signals are a priori known to have nonzero components of certain 
sign. All major assumptions stated at the beginning of the previous section will continue to hold in this sec- 
tion as well; namely, we will continue to consider matrices A with i.i.d. standard normal random variables; 
elements of v will again be i.i.d. Gaussian random variables with zero mean and variance a. The main 
difference, though, comes in the definition of x. We will in this section assume that x is the original x in 
(3) that we are trying to recover and that it is any fe-sparse vector with a given fixed location of its nonzero 
elements and with a priori known signs of its elements. Given the statistical context, it will be fairly easy 
to see later on that everything that we will present in this section will be irrelevant with respect to what par- 
ticular location and what particular combination of signs of nonzero elements are chosen. We therefore for 
the simplicity of the exposition and without loss of generality assume that the components xi, X2, . . . , x n _& 
of x are equal to zero and the components x n _fc+i, x„_fc+2, . . . , x n of x are greater than or equal to zero. 
However, differently from what was assumed in the previous section, we now assume that this information 
is a priori known. That essentially means that this information is also known to the solving algorithm. Then 
instead of (6) one can consider its a better ("signed") version 

min ||y — Ax|| 2 

X 

subject to ||x||i < ||x||i 

^ > 0, 1 < i < n. (158) 

In what follows we will mimic the procedure presented in the previous section, skip all the obvious 
parallels, and emphasize the points that are different. The framework that we will present below will again 
center around finding the optimal value of the objective function in (158). In the first of the following two 
subsections we will create a lower bound on this optimal value (this will essentially amount to creating a 
procedure that is analogous to the one presented in Section 2.1). We will then afterwards in the second of 
the subsections create an upper bound on this optimal value. As it was done in the case of general x in the 
previous section we will in the third subsection show that the two bounds actually match. To make further 
writing easier and clearer we set already here 

Cobj+ = min ||y - Ax|| 2 

X 

subject to ||x||i < ||x||i 

^ > 0, 1 < % < n. (159) 

3.1 Lower-bounding ( obj+ 

In this section we present the part of the framework that relates to finding a "high-probability" lower bound 
on C^ b y As in the previous section we again assume that there is a (if necessary, arbitrarily large) constant 
C w such that 

P(||w JaMO || 2 < C w ) = 1 - e"^. (160) 
We again start by noting that if one knows that y = ^4x + v holds then (159) can be rewritten as 

min || v + Ax — -AxlU 

X 

subject to ||x||i < ||x||i 

Xi > 0, l<i<n. (161) 
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After a small change of variables, x = x + w, one has an equivalent to (14) 



min 1 1 A v 

w 



subject to ^2 Wj < 



Xj + Wj > 0, 1 < i < n, 



(162) 



where as earlier A v = [—A v] is an m x (n + 1) random matrix with i.i.d. standard normal components. 
Let 



S+(cr,x,C w ) = { 
Further, let 



G R n+1 \ || w|| 2 < C w and ^ Wj < and Xj + w, > 0, 1 < i < n}. 

i=i 

(163) 



/<>&,•+ (<T,W) = ||A V 



(164) 



and set, 



/-(help) 
^obj+ 



min / obj+ (cr,w) 



mm 



I A. 



[w3- ff p e s+( ff ,x,C w ) 



[w^reS+MA) 



2 = min max a T A^ 

[w T CT] T 65+((7,X,Cw) Il a ll2 = l 

(165) 

Now, after applying Lemma 1 and following the procedure from the previous section one has 



w 

a 



P( mm (/ow+^.w) + VHIi + ^) > C$- + ) > pf. 



(166) 



where 



+ -P 



mm 

Jw T <7] T eS+(<7,X,C w ) 



w 



+ ^ 2 ||g||2 +^h iWi + 



(167) 



i=l 



.(/) 



As in previous section we will essentially show that for certain Qbj+ tn i s probability is close to 1 which will 
imply that we have a "high probability" lower bound on Co6j+- Let 



£+(cr,g,h,x)= min I vl|w||| + o- 2 ||g|| 2 + V h, 



W.; 



(168) 



i=i 



Now we split the analysis into two parts. The first one will be the deterministic analysis of £+(<7, g, h, x) 
and will be presented in Subsection 3.1.1. In the second part (that will be presented in Subsection 3.1.2) we 
will use the results of such a deterministic analysis and continue the above probabilistic analysis applying 
various concentration results. 
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3.1.1 Optimizing £ + (a, g, h, x) 

In this section we compute £+(<r, g, h). We first rewrite the optimization problem from (168) in the follow- 
ing way 

I n 

£+(o",g,h,x) = min y ||w||| + cr 2 ||g|| 2 + h;W; 

i=l 

n 

subject to ^2 w « < 

i=i 

Xj + Wj > 0, 1 < i < n 



|w||2+a 2 < VCl + <r 2 . (169) 
The Lagrange dual of the above problem then becomes 



C(u, A (2) ,w,7) = ^/||w||| + <T 2 ||g|| 2 + ^hiWi + w i 

i=i i=i 

n n—k 

- \f } (x, + Wi ) - £ Af'w, + 7 ( + * 2 - \/^+^)- (170) 



i=n—k+l i=l 

After a few further arrangements we finally have 



£(«/, A( 2 ) , w, 7 ) = ^/H w||l + a 2 (||g|| 2 + 7 ) + ]T( h * + v ~ X f Vi " E Af ) x l - 7 7C 2 ^. 

i=l i=n— fc+1 

(171) 

One can then write the following dual problem of (169) 

£ + (a, g, h,x) = max min C(u, X^, w) 

(2) 

subject to A) > 0, 1 < i < n 

7 > 0, (172) 

where we of course use the fact that the strict duality obviously holds. The inner minimization over w is 
now doable. Setting the derivatives with respect to Wj to zero one obtains 

W ll'l 2+7 l+( h + - (1) - A(2) ) = ^ ^ 

V\\ w U + a 

where and A^ 2 ^ are as defined in the previous section. From (173) one then has 



w(||g|| 2 + 7 ) = - V H w lli + a2 ( h + uzW ~ A(2) ) ( 174 ) 

or in a norm form 

H|l(||g||2 + 1? = (l|w|| 2 + a 2 )||h + uzW - (175) 
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From (175) we then find 

||w so ; + || 2 = , =, (176) 

V(llgll2+7) 2 -||h + ^z( 1 )-A( 2 )||2 

and from (174) 

w S ol+ = j (177) 
V(llgll2+7) 2 -||h + ^a)-A( 2 )||2 

where w so ; + is of course the solution of the inner minimization over w. As in the previous section, one 
should note that (176) and (177) are of course possible only if ||g||2 + 7 — ||h + wz^ — A*- 2 -* H2 > 0. (Also, 
as in the previous section if for v and A^ 2 ) that are optimal in (172) the condition is not met then for the 
corresponding (a, /3) the worst-case ||w||2 is infinite with overwhelming probability). Plugging the value of 
w S oi+ from (177) back in (172) gives 



£+(a,g,h,x)= max a J (\\g\\ 2 + 7 ) 2 - ||h + uzW - A( 2 ) || 2 - V \f ) x, - 7^ + ^ 

U > X(2) '"< i=n-fc+l 

subject to > 0, 1 < i < n 

v>0 

||g|| 2 + 7 - ||h + i/ Z W - A (2) || 2 > 

7 > 0. (178) 
Now, the maximization over 7 can be done. After setting the derivative to zero one finds 

, Hgll 2 + ^ = _ ^/d + a 2 = (179) 

V(llgll 2 +7) 2 -l|h + ^a)-A( 2 )|| 2 

and after some algebra 



'l + ^i||h W 1} -A( 2 )|| 2 - ||g|| 2 , (180) 



where of course ~f pt+ would be the solution of (178) only if larger than or equal to zero. Alternatively of 
course 7op t+ = 0. Now, based on these two scenarios we distinguish two different optimization problems: 

1 . The "overwhelming " optimization — signed x 

f OT+ (<7,g,h,x) = max cry ||g||| - ||h + vz,W - \^)\% - ^ A f } ^ 

i=n—k+l 

subject to v > 

Af } > 0,1 < i < n. (181) 
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2. The "non-overwhelming" optimization — signed it 



£ruw+(0-,g,h,x) = max ^C 2 , + cr 2 ||g|| 2 - C w ||h + zvz (1) - A (2) || 2 - ^ A^Xj 

i=n— fc+1 



!/,A( 2 ) 

subject to v > 

,(2) 



A; > 0, 1 < i < n. 



(182) 



The "overwhelming" optimization is the equivalent to (178) if for its optimal values v + and A( 2 +) holds 

(183) 



y 1 + ^2- ll* 1 + i/+z(1) - A(2+) II2 < llslb, 

We now summarize in the following lemma the results of this subsection. 

Lemma 11. Let v + and A( 2+ ) be the solutions of (181) and analogously let u + and A( 2 +) be the solutions 
of (182). Let £+(cr, g, h, x) be, as defined in (168), the optimal value of the objective function in (168). Then 



£+(<7,g,h,x) = < 



a V ||g||| - ||h + zAzd) - A( 2 +) HI - YZ= n -k + i A? +) x,, if^Jl + ^\\h + v+zW - A( 2 +) || 2 < 
I VCw + ^ 2 l|g||2 - C w ||h + - A^) || 2 - Etn-k + i A? +) x,, Anvwe 

Moreover, let w+ &e solution of (168). Then 



(184) 



w+(cr,g,h,x) = < 



r(h+i/+z( 1 >-A( 2 +)) 



\/||g||2-||h+^+z( 1 )-A(2+)||2 
C , w (h+^+z( 1 )-A( 2 +)) 

— L ^i; =-= — - , otherwise 

L ||h+I/+z( 1 )-A( 2 +)|| 2 



' 7 ^l + srllh + ^+zW - A(2+)|| 2 < ||g|| 2 



, (185) 



|w+(<7,g,h,x)|| 2 = < 



^l£=£A , ? j ./TT^iih + ^zW-A^ii,^ 

Vl|g|ll-||h+z,+z(i)-A(2+)||l V 

C w , 



g 2 



otherwise 



(186) 



Proof. The first part follows trivially. The second one follows from (177) by choosing the optimal v + and 
A( 2 +) or alternatively v + and A( 2 +). □ 

3.1.2 Concentration of £+(<r, g, h,x) 

In this section we establish that £+(cr, g, h,x) concentrates with high probability around its mean. The 
following lemma is an analogue to Lemma 3.1.2 

Lemma 12. Let g and hbem and n dimensional vectors, respectively, with i.i.d. standard normal variables 
as their components. Let a > be an arbitrary scalar. Let £(cr, g, h, x) be as in (22). Further let eu p > 
be any constant. Then 

P(K + (<7,g,h,x) -£K + (<7,g,h,x)| > e Hp EU(a,g,h,±)) < ^p{- i€HP ^2C2 , + % X))2 } ■ ^ 187 ) 
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Proof. It follows by literally repeating every step of proof of Lemma . The only difference is that one now 

has S+(a, x, C w ) instead of S vv (a, x, (7 W ) . □ 

Moreover one then has that ||h + zAzW — A( 2 +) H2 and Hh + iz+z^ 1 ) — A( 2 +) H2 concentrate as well which 
automatically implies that w+ also concentrates. More formally, one then has analogues to (187) 

P(|||h + z^z« - AP+) || 2 - E\\h + ^z« - A^)|| 2 | > ei norm) £||h + zAz« - AP+") || 2 ) < e"^™'" 



P(|||w+||2-£?||w+||2|>eS w) £7||w+|| 2 ) < e" £ 2 W)n 



(188) 



■ , (norm) n (norm) n , (w) „ ., , (norm) (norm) 

where as usual q > 0, e 2 > 0, and > are arbitrarily small constants and £3 , e\ , 
and e 2 w ^ are constant dependent on e ( raorm ) > 0, e ^ norm ) > q, and > 0, respectively, but independent 
of n. After repeating every step between (55) and (64) one arrives to the following analogue to Lemma 5. 

Lemma 13. Let v be an n x 1 vector of i.i.d. zero-mean variance a 2 Gaussian random variables and let 
Abe an m x n matrix of i.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then Cobj+ be as defined in (159) and let w + be the solution of (162). Assume 
-P(H W+ ||2 < Cw) > 1 — e~ e °™ n for an arbitrarily large constant (7 W and a constant ec* w > dependent 
on C w but independent of n. Then there is a constant ei ower > 

P(U J+ > CgT r) ) > (! - e ~ tl ° wer ")(! " e " eCwn )> ( 189 ) 

where 

*obj+ 

£ + (a, g, h, x) is as defined in ( 168), and eu p , , are all positive arbitrarily small constants. 

Proof. Follows from the previous discussion. □ 

3.2 Upper-bounding ( obj+ 

In this section we present a general framework for finding a "high-probability" upper bound on Co6j+- To 
that end, let r+ and C Wup+ be positive scalars (as in Section 3.2, we in this subsection present a general 
framework and take these scalars to be arbitrary; however to make the bound as tight sa possible in the 
following subsection we will make them take particular values). As earlier, if we can show that there is a 
w £ R n such that ||x + w||i < ||x||i and ||v — Aw||2 < r + with overwhelming probability then r + can act 
as an upper bound on ( bj+- We then start by looking at the following optimization problem 

min llx + w|h — ||x||i 



= C 1 - zup)EH + (v, g, h, x) - ei h) ^ - (190) 



I A 



I2 < r+ 



w 

" " a 
Xj + Wj > 0, 1 < % < n 



Mt<Cl up+ , (191) 
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where A v is as defined right after (14). If we can show that with overwhelming probability the objective 
value of the above optimization problem is negative then r + will be a valid "high probability" upper-bound 
on Cofcj+- Moreover, it will be achieved by a w for which it will hold that ||w|| 2 < C Wup+ . 

First let us rewrite the objective value of the above optimization problem in a slightly more convenient 
form 



mm 

X 



E 



W,; 



I A 



Xj + Wj > 0, 1 < i < n 
l|w|||<C^ p+ . 



(192) 



Now, we proceed in a fashion similar to the one from Subsection 3.1.1. We first do a slight modification of 
the first constraint from (192) in the following way 



mm 



E 



w,- 



I A 



12 ^ 



< r 



Xj + Wj > 0, 1 < i < n 



[-Av] 



2 ^ ' + 
w 



a 



= b 



|w||l < C^_ + . 



(193) 



The Lagrange dual of the above problem then becomes 

n n 

£(A( 2 ),z,( 1 ), 7l , 72 ,w,b) = ^w J - £ Af } (x, + w,) 

i=l i=n—k+l 
n—k n 

- E A ? )w * " uWAw + ^ (1)vfT " ^ (1)b + ^(E h " ~ r +) + T2(||w||l - Cl up+ ), (194) 



i=l 



i=l 



where v^> and A^ 2 ) are vectors of Lagrange variables as in previous sections. After rearranging terms we 
further have 



E \ w * 

i=n—k+l 



£(A(V (1) ,7i,72,w,b) 

+ ((»« - A< 2 >) r - v WA)w + A. - A + 71 (E b ? " r D + T2(||w||l - Cl up+ ). (195) 



%=i 
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Finally we can write a dual problem to (193) 

max min C(X ( - 2 \ 71, 72, w, b) 

A( 2 ),^( 1 ),7i,72 w,b 

subject to \f ) > 0, 1 < i < n 

71 >0, 

72 > 0, (196) 

where we of course use the fact that the strict duality obviously holds. Now, after repeating all the steps 
from (75) to (84) (wherever we had 2A( 2 ) we would now have A^ 2 ) and there will be no upper bound on 
components of A^ 2 ^ in the corresponding optimization problems) one obtains and analogue to (84) 

n 

- min max ((z« - X^f - ^^a - u^va + \\ 2 r+ + V A^x, 

subject to \f ) > 0, 1 < i < n. (197) 
Now let us define as 

n 

= - min max ((z« - A^) T - A)a - z^W + H^ 1 ) || 2 r+ + V A, (2) Xl 

subject to Af ) > 0, 1 < i < n. (198) 
Any r + such that lim n _> P{fobj+ > 0) = 1 is then a valid "high-probability" upper bound. Set 



and 



A (2+) = | A (2) G ^n| A (2) > 0, 1 < i < n}, (199) 



n 

e„ p+ (<7,g,h,x,C w J= max (Jci +a 2 ||g|| 2 -C Woj)+ ||h+ I /zW-A( 2 ))|| 2 - £ A^Sq). 

(200) 

After further repeating all the steps between (84) and Lemma 14 (the only difference is that A- 2 ^ G A^ 2+ ) in 
the "signed" scenario) one then has the following "signed" analogue to Lemma 14 (which in essence gives 
a way of finding an r + such that lim n _>. P(fobj+ — 0) = !)• 

Lemma 14. Le? v &e an n x 1 vector of i.i.d. zero-mean variance a 2 Gaussian random variables and let 
A be an m x n matrix of i.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then Cobj+ be as defined in (159) and let w + be the solution of (162). There is 
a constant e up p er > 

P((ob j+ < Citj+ r) ) > 1 " e~^ n , (201) 

where 

CSjT = U + eu P )E( up+ (a, g, h, x, C Wup+ ) + e^V^ + ef ^ (202) 

£ up+ (a, g, h, x, C Wup+ ) is as defined in ( 200), eu p , , are all positive arbitrarily small constants, and 
C Wup+ is a constant such that ||w + ||2 < C- Wup+ . 

Proof. Follows from the previous discussion. □ 
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3.3 Matching upper and lower bounds 



In this section we specialize the general bounds introduced above and show how they match. We will again 
divide presentation in three subsections. In the first of the subsections we will make a connection to the 
noiseless "signed" case and show how one can then remove the constraint from (184), (185), and (186). 
In the second subsection we will consider a w such that | ||"w|| 2 — ||w + ||2| > ew up ||w+||2. We will then 
quantify how much the lower bound that can be computed for such a w through the framework presented 
in Section 3.1 deviates from the optimal one obtained for w+. In the last subsection we will then show that 
there will be a w such that the upper bound computed through the framework presented in Section 3.2 will 
deviate less. That will in essence establish that upper and lower bounds computed in the previous sections 
indeed match. 



3.3.1 Connection to the l\ optimization of signed x 

In this subsection we establish a connection between the constraint in (184), (185), and (186) and the funda- 
mental performance characterization of l\ optimization derived in [62] (and of course earlier in the context 
of neighborly polytopes in [27]). We first recall on the condition from Lemma 11. The condition states 



a 2 



l + -2-||h + ^+z« -A( 2 +)|| 2 < ||g|| 2 , (203) 



where C w is an arbitrarily large constant and v + and A( 2 +) are the solutions of 



max a^||g|| 2 -||h + z/z( 1 )-A( 2 )|| 2 - aJ 2) x, 

i=n— k+l 

subject to \f } > 0, 1 < i < n 

v>0. (204) 

Now we note the following equivalent to (204) for the case when nonzero components of x are infinite 

max a\J ||g|| 2 — ||h + z^ 1 ) — A( 2 )|| 2 
subject to \f ) > 0, 1 <i <n-k 

\f ) =o,n-/c + l<i<n 

v > 0. (205) 
To make the new observations easily comparable to the corresponding ones from [61,63] we set 

h + = [h(lj>h(2)> • • • >* 1 (n-fcj>hri-fc+ 1 ' hn-fc+2, • • • ' ^-n} 7 ', (206) 

where [hi^, hS, • • • , h.( n -jfc)(™-*o] w& [hi, h 2 , . . . , h ra _fe] sorted in increasing order (possible ties in the 
sorting process are of course broken arbitrarily). Also we let z^ 2 ) be as in the previous section, i.e. let it be 
such that z[ 2 ^ = — z ,n — k + 1 < i < n and z^ = z^ , 1 < i < n — k. It is then relatively easy to see 
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that the above optimization problem is equivalent to 



max ay Hslli ~~ ~~ ^z( 2 ) + A( 2 ) ||| 
subject to \f ) >0,l<i<n-k 

Aj 2) =0,n-fc + l<i<n 

1/ > 0. (207) 

Let and A( £l+ ) be the solution of the above maximization. Further, consider the following "signed" 
version of the standard l\ -optimization 

min ||x||i 
subject to Ax. = y 

X; > 0. (208) 

Then, as we showed in [63] and [62], the inequality 

£||g|| 2 > E\\h. + - u h+ ^ + A ( ^ 1+) || 2 (209) 

establishes the following fundamental performance characterization of the i\ optimization algorithm from 
(208) that could be used instead of LASSO from (158) to recover signed x in (1) (which is a noiseless 
version of (3)) 

rj- -(erfinv(2i^| L -i)) 2 
(1 - /3+)^ V2erfinv(2l^ - 1) = 0, (210) 

where of course a+ = — and /3+ = -. As it is also shown in [63] and [62] both of the quantities under the 
expected values in (209) nicely concentrate. Then with overwhelming probability one has that for any pair 
(a, (3) that satisfies (or lies below) the fundamental performance characterization of t\ optimization given 
in (210) 

||g|| 2 > ||h+-^ 1+ z( 2 )+A^ + )|| 2 . (211) 

(2) 

Moreover, since A^ > 0,n — k + 1 < i < n, in (204) one actually has that (211) implies that with 
overwhelming probability 

||g|| 2 > ||h + ^z« -A(2+)|| 2 , (212) 

which for sufficiently large C w is the same as (203). We then in what follows assume that pair (a, (3) is such 
that it satisfies the "signed" fundamental l\ optimization performance characterization (or is in the region 
below it) from (210) and therefore proceed by ignoring condition (203). 

3.3.2 Deviation from the lower-bound 

In this subsection we establish that ||w; asso+ || 2 (of course, w; asso+ = x+ — x, where x+ is the solution of 
(158)) can not deviate substantially from ||w+ 1| 2 without substantially affecting the value of the lower bound 
on the objective in (158) that is derived in Section 3.1. To that end let us assume that there is a w ff+ that 
is the solution of the LASSO from (158) (or to be slightly more precise that is such that x+ = x + w„jj + , 

where obviously x+ is the solution of (158)). Further, let |||w // + || 2 — ||w+|| 2 | > e Wlip ||w + || 2 , where e Wup 
is an arbitrarily small constant. 
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One can then write a "signed" analogue to (112) 



&>//+(<7,g,h,x,w o//+ ) = max (\/ w o// + ^ 2 \\sh-^off\\h.+^-X ( ^)\\ 2 - ^ A^x;). 

A( 2 )eA( 2 +),i/>0 v JJ . 

— i=n— fc+1 

(213) 

After repeating all the arguments between (1 12) and Lemma 9 one obtains the following analogue to Lemma 
9. 

Lemma 15. Let v be an n x I vector ofi.i.d. zero-mean variance a 2 Gaussian random variables and let 
A be an m x n matrix ofi.i.d. standard normal random variables. Consider an x defined in (7) and a y 
defined in (3) for x = x. Let then Cobj+ be as defined in (159) and let w G //+ be the solution of (162). Let 
a and j3 be below the fundamental characterization (210) and let w+ be as defined in (185). Assume that 
|||w // + || 2 - ||w+|| 2 | > ^w^ p 1 1 w - * - 1 1 2> where e w is an arbitrarily small but fixed constant. Then there 
would be a constant e a ff > 0, and arbitrarily small positive constants eu p , ef*\ ef^ such that 

P(U j+ > Cg?) > 1 - e-*°ff n , (214) 

where 

& ] = (1 " + 0( "Z UP J ^U+(^g,h,x) - e f ] ^-A 9) ^ (215) 

A{l-\-e Wup ) 

and ^ ov+ (a, g, h, x) is as defined in ( 181 ). 

Proof. Follows from the discussion in Section 2.3.2. □ 
3.3.3 Deviation of the upper bound 

In this section we establish that ||wz asso+ || 2 can not deviate from ||w+|| 2 as much as it was assumed in the 
previous section which is conceptually enough to make the bounds from Sections 3.1 and 3.2 match. All 
arguments from Section 2.3.3 can be repeated again. The only difference will be that in all optimization 

(2) 

problems from Section 2.3.3 one will now have no upper bound on A^ , 1 < i < n (this essentially amounts 
to using set A^ 2+ ) instead of set A^ 2 )). One then has a "signed" analogue to (142) 



EZ up+ (a,g,h,x,E\\w+\\ 2 ) = E max (V(£||w+|| 2 ) 2 + a 2 \\ g \\ 2 -E\\w+\\ 2 \\h+uz^ -\^)\\ 2 - V Afx, 

A(2)gA(2+) „> v • 

— i=n— k+1 



= E max {J (Ed+f + <7 2 ||g|| 2 - Ed+\\h + uz^ - A^)|| 2 - V Af } 5q) 

— i=n—k+l 

n 

= £min max (y/dP + ^\\ g|| 2 -d\\h+vz^ - A (2) )|| 2 - V \f } x;) = E^ m+ {a, g, h, x), 

~ i=n—k+l 

(216) 

where d + = ||w+|| 2 would be the solution of a "signed" analogue to (125). Following the arguments after 
(142) one then has that the assumption of Lemma 13 is unsustainable and that ||w; asso+ || 2 can not deviate 
substantially from ||w+ 1| 2 . This then implies that with overwhelming probability the objective value of (158) 
concentrates around E£ ov+ (cr, g, h, x) and consequently that ||w2 asso+ || 2 concentrates around i£||w+||2. 
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3.4 Connecting all pieces 

In this section we connect all of the above. The following theorem essentially does so. 

Theorem 3. Let v be an n x 1 vector ofi.i.d. zero-mean variance a 2 Gaussian random variables and let 
A be an m x n matrix of i.i.d. standard normal random variables. Further, let g and h be m x 1 and 
n x 1 vectors of i.i.d. standard normals, respectively. Consider a k-sparse x defined in (7) and a y defined 
in (3) for x = x. Let the solution of (158) be x + and let the so-called error vector of LASSO from (158) 
be w; asso+ = x+ — x. Let n be large and let constants a = ^ and (3 = ^ be below the fundamental 
characterization (210). Consider the following optimization problem: 



£ w+ (a,g,h,x) = max <7y||g||l - ||h + i/zW - A( 2 )||| - Af } x; 

i=n— k+l 



subject to v > 
Let v + and A( 2 +) be the solution of (217). Set 



\f ] > 0, 1 < i < n. (217) 



r + ll ||h + zA z q)-A( 2 +)|| 2 n ^ 
|w+|| 2 = a — - = = (218) 

Vl|g|||- Hh + iz+zW -AP+)||| 



Then: 



P((l-e ( { asso) )Ed ov+ (v,g,h,±) < \\y-A±\\ 2 < (l + ef asso) )EU + (^S,h,ic)) = l- e -4 iasso) ™ {2 \9) 



and 

P((l - ef asso) )^||^||2 < ||w iasso+ || 2 < (1 + ef asso) )ii;||^+|| 2 ) = 1 - e -4 !asso) ™, (22 0) 



where e ^ asso ^ > o « a« arbitrarily small constant and e ^ asso ^ i s a constant dependent on e ^ asso ^ an d a ]y U t 
independent of n. 

Proof. Follows from the above discussion. □ 
3.4.1 LASSO's generic performance 

In this section we show how the results presented in the above theorem can be adapted to the so-called 
"worst-case" scenario or as we refer to it "generic performance" scenario. Repeating the line of arguments 
from Section 2.4.1 one can establish the following generic equivalent to Theorem 3. 

Theorem 4. Assume the setup of Theorem 3. Consider the following optimization problem: 

& b) (a,g,h) = min ||h + ra W - A^ || 2 

subject to v > 

Aj 2) =0,n-k + l<i<n 

\f ] > 0, 1 < i < n - k. (221) 
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Let Vgen+ and X^ en +) be the solution of (221). Set 



|w 9en+ || 2 = a " i~ " ±^=. (222) 

5 |||-||h + ^ en+ z( 1 )-A(^+)||2 



Then: 



P{^lasso + \\Wlasso + h G ((1 " ^ ^ )#|| W gen+ 1| 2 , (1 + ef ^ )£7|| W ffen+ 1| 2 )) > 1 - e^^^ 

P(||w iasso+ || 2 < (1 + e? asso) )£||w 9en+ || 2 ) > 1 - e -4 !asso \ 

(223) 

where e ^ asso ^ > o aw arbitrarily small constant and e ^ asso ^ an d e (j asso ) are cons t an t s dependent on 
^(lasso) ^ \}ut independent of n. 

Proof. Follows by the use of the same arguments that were used to establish Theorem 3. □ 

The following corollary then provides a quick way of computing the concentrating point of the "worst 
case" norm of the error vector. 

Corollary 2. Assume the setup of Theorems 3 and 4. Let a = ^ and /3+ = \. Then 



r>/-i III II ^ //i (lasso)-, Ct w (lasso)-, / OL w (lasso) 

^(3wj asso+ |||w, asso+ || 2 € ((1 -e\ >W +,(l + ei >W +)) > 1 - e " 2 



(lasso)-, I Ol w N . a ..C" 5 ™), 



P(||w^ S0+ || 2 <(l + e™W— ^t) > 1-e 

V a — a 



(224) 



where a+ < a « swc/j rta? 



ry- -(er/? W (2i^-i)) 2 
(1 - + V2erfinv(2—^ - 1) = 0, (225) 

^(lasso) ^ q ^ an ar jyi trar iiy sma n constant, and e ^ asso ^ an d £ ( iasso ) are cons t a nts dependent on e ^ asso ^ an d 
a but independent ofn. 

Proof. Follows by the use of the same arguments that were used to establish Corollary 1 and a recognition 
that the fundamental characterization of interest in the "signed" case is the one given in (210). □ 

Based on the above corollary one can then for any a and any pair (a,/3+) (that is below funda- 
mental characterization (225) or alternatively (210)) determine the value of the worst case -E||w; asso+ || 2 

as a J a ™ + . We present the obtained results in Figure 4. For several fixed values of the worst case 

E\\ wiasso+ 1| 2 we determine curves of points (a, /?+) for which these fixed values are achieved (of course 
for any a that is below a curve the value of the corresponding worst case -E||w^ asso+ || 2 is smaller). As 
in the previous section, the lower the norm-2 of the error vector the smaller the allowable region for pairs 
(a, /?+). Also as it was the case in the previous section, the results of the above corollary match those 
obtained in [7, 26] through a state evolution/bilief propagation type of analysis for the "signed" version of 
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the LASSO from (5) (signed version of the LASSO from (5) as expected assumes just simple adding of the 
positivity constraints on the components of x). 




4 Connecting LASSO's from (6) and (5) 

In this section we establish a connection between the LASSO algorithm from (6) that we analyzed in Section 
2 and the more well known form of LASSO from (5). Instead of well-known (5) we will consider its a slight 
modification 

mm l|y — ^ x l|2 + A/asso || x || 1 • 

(226) 

X 

Both LASSO's, (6) and (226), (as well as the one from (5)) rely on some type of the prior knowledge that can 
be available about A, v, or x. In (6) we assumed that one knows ||x||i (of course, if one has no knowledge 
of ||x||i LASSO from (6) simply can not be run). On the other hand the LASSO from (226) (as well as the 
one from (5)) requires that one sets in advance parameter Xi asso which can be a tough task if there is no 
a priori knowledge about A, v, or x. Now even if there is some a priori available knowledge about these 
objects there are still many ways how one can set A; asso . We will below show a particular way of setting 
Xiasso i n (226) that can make LASSO's from (6) and (226) essentially equivalent (of course, as long as one 
is interested in performance measures discussed in this paper). In the interest of saving space we will sketch 
only the key arguments without going into tedious details similar to the ones presented in earlier sections. 
(All that we mention below can be made precise, though; in fact, one can pretty much reach the same level 
of exactness demonstrated in Sections 2 and 3; however, the length of the precise probabilistic arguments 
would equal (if not exceed) the length of the arguments presented in Sections 2 and 3.) 

Now, let Xiasso in (226) be such that Xi asS o = Ev where v is the solution of (42). Then (226) becomes 

min |ly — -4x||2 + J5f/|lx||i 3 (227) 

X 
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or in a more convenient form 



min ||y — ^4x||2 + i££||x||i — i£i>||x||i, (228) 

X 

This could be rewritten in a way analogous to (14) as 

2 + Ev\\k + w||i - £z>||x||i, (229) 



min 1 1 Ay 

w 



where A v is as in (14). One can then repeat all arguments from the beginning of Section 2.1 (essentially 
those before Section 2.1.1) to arrive at the following analogue of (23) 

n 

£,conn(cr, g,h,x) = min y ||w||| +cr 2 ||g|| 2 + ^h;Wj + EC>\\x + w||i - £i>||x||i 
w i=i 



subject to yj ||w||| + a 2 < ^C^ + a 2 . (230) 

Now, one should note that Eu in the above optimization is chosen as the "optimal" (it is actually the concen- 
trating point of the optimal one; to make this really precise one would need to go through all the probabilistic 
arguments of Section 2 and plus some more) v in the Lagrange dual of (23). One then has that arguments 
from Section 2.1 (essentially an appropriate repetition of those that follow (23)) will produce the lower 
bound on the objective of (228) that is with overwhelming probability arbitrarily close to the one derived 
in Lemma 5. The arguments from Section 2.2 related to the upper bound can be trivially repeated as well 
since the negativity of the objective in (67) implies that r is also an upper bound on the optimal value of the 
objective in (228). The matching arguments from Section 2.3 then follow as well. Now if one let w conn be 
the solution of (229), then with overwhelming probability ||w con „||2 concentrates around i?||w||2, where w 
is as defined in Theorem 1. 

For the signed case the arguments are the same, only instead of Ev in (227), (228), and (229) one should 
use Ev + where v+ is the solution of (181). Also, as it is probably obvious, this time ||w conri ||2 concentrates 
around Ellw+IU where w+ is as defined in Theorem 3. 



5 A relation between a LASSO and an SOCP 

In this section we show that there is an SOCP equivalent to the LASSO from (6) (as long as the norm-2 of 
the error vector is a performance measure of interest). To that end let us recall that an SOCP algorithm for 
finding an approximation of x if A and y from (3) are known can be (see, e.g. [13]) 

min ||x||i 

X 

subject to ||y — Ax||2 < r socp . (231) 

The choice of r socp critically impacts the outcome of the above optimization. In fact more is true, the choice 
of 

r socp heavily depends on what type of approximation error one is looking for. As we have mentioned in 
Section 1 a popular choice for r socp is the smallest quantity that is with high probability larger than 1 1 v 1 1 a . 
There are probably many reasons for such a choice. One of them is that it would with high probability 
guarantee that the original x in (3) is permissible in (231). Now, if one is looking for an x that will be close 
in norm-2 to the original x then it is not necessary to look for the original x (especially so given that finding 
original x is in general pretty much impossible). So if one gives up on that then the value of r socp can go 
even lower than the smallest quantity larger (with overwhelming probability) than 1 1 v 1 1 2 - One should also 
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note that by lowering r socp one would give up not only possibility to find x (which is tiny anyway) but also 
highly likely more when it comes to the structure of the solution vector. This is of course a problem on its 
own that requires a thorough discussion. However, since we now look only at the norm of the error vector 
as a performance measure we stop short of pursuing this discussion here any further. 
To go along these lines we choose 

r socp = E^ up (a,g,h,E\\w\\ 2 ) = E^ m (a, S ,h,i) < E^ ob \a,g,h) < ay/^i = E\\v\\ 2 . (232) 

Now, let ~x. socp be the solution of (231) (with v SOC p as in (232)). Further let ~x. socp 

= w socp + x. Let w be as 

defined in Theorem 1. Then as shown in Sections 2.1, 2.2, and 2.3 ||w socp ||2 concentrates around £7||w||2. 
Basically, the argument is that if | ||w socp ||2 — ||w||2| > e w u J|w||2 for a fixed arbitrarily small positive e Wup 
and ||x + w socp ||i < ||x||i then with overwhelming probability ||y — Ax socp ||2 > r socp . On the other hand, 
as it was also established in Sections 2.1, 2.2, and 2.3, there is a w (which concentrates around £7|| w|| 2 
with overwhelming probability) such that ||x + w||i < ||x||i and ||y — Ax socp ||2 < r socp . This essentially 
establishes that if one chooses r socp in (231) as suggested in (232) then the norm-2 of the error vector will 
be the same as the norm-2 of the error vector one obtains through LASSO's from (6) and (226) (the latter 
one of course with an appropriate choice of A/ asso ). 

As we hinted above what we presented here is only a characterization of a particular performance mea- 
sure of an SOCP algorithm (the same is of course true for the LASSO algorithms). How adequate is such a 
performance measure is whole another story that we will explore in more detail elsewhere. 

6 Numerical results 

In this section we present a set of numerical results related to the theoretical predictions that we derived in 
earlier sections. We will divide the presentation into two groups: 1) the set of results that will relate to the 
general (unsigned) unknown sparse vectors and 2) the set of results that will relate to signed unknown sparse 
vectors. To make scaling easier in all experiments we set a = 1. We also assumed that nonzero components 
of x are all of equal and large magnitude. For the concreteness we set this magnitude to be ^=2. For every 
setup that we discuss below we ran 100 numerical experiments. 

6.1 Numerical results related to general x 

In this subsection we will present numerical results that relate to the theoretical ones created in Sections 2 
and 4. We will consider two groups of (a, (3 W ) regimes, one that we will refer to as the low (a, (3 W ) regime 
and the other that we will refer to as the high (a, (3 W ) regime. 
2) Low (a, p w ) regime — p = g||w '; ssj2 = 2 

We ran a carefully designed set of experiments intended to show a specific behavior of the LASSO's from 
(6) and (227) in what we will refer to as the low (a, (3 W ) regime. For a G {0.3, 0.5, 0.7} we determined 
three values of /3 W from the contour LASSO line that corresponds to p = 2 in the figure given in Section 2. 
We then ran (6) assuming that ||x||i is known and (227) using theoretical value for Ev where, as mentioned 
in Section 4, v is the solution of (42). We call the optimal value of the objective in (228) C conn (this value 
is the optimal value of (227) shifted by a constant). Also, for this set of experiments we set n = 2000. 
Obtained results are presented in Table 1 . The theoretical values for any of the simulated quantities in any 
of the simulated scenarios are given in parallel as bolded numbers. We observe a solid agreement between 
the theoretical predictions and the results obtained through numerical experiments. 

2) High (a, W ) regime — p= £i|w '; ss ° 112 = 3 
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Table 1: Experimental/theoretical results for the noisy recovery through LASSO's; a = 1, p = 
E\\wi asso \\ 2 = 2; (6) and (227) were run 100 times with n = 2000 



a 




Ev 


\Jn 


E |w conn |2 


E(obj 

\fn 


E \\v?i asso |2 


0.3 


0.21 


1.3141 


0.2444/0.2449 


2.0225/2 


0.2449/0.2449 


2.0188/2 


0.5 


0.27 


1.0227 


0.3159/0.3162 


2.0058/2 


0.3162/0.3162 


2.0018/2 


0.7 


0.33 


0.7959 


0.3717/0.3742 


2.0168/2 


0.3721/0.3742 


2.0155/2 



Table 2: Experimental/theoretical results for the noisy recovery through LASSO's; a = 1, p = 
E\\wiasso\\2 = 3; (6) and (227) were run 100 times 



a 


(3 w /a 


Ev 


EC,conn 


E w conn 2 




E |w; asso |2 


0.3 


0.249 


1.2508 


0.1699/0.1732 


3.1714/3 


0.1705/0.1732 


3.1507/3 


0.5 


0.325 


0.9477 


0.2231/0.2236 


3.0560/3 


0.2239/0.2236 


3.0405/3 


0.7 


0.41 


0.7046 


0.2579/0.2646 


3.1166/3 


0.2585/0.2646 


3.1069/3 



We also ran a carefully designed set of experiments intended to show a specific behavior of the LASSO's 
from (6) and (227) in what we will refer to as the high (a, (3 W ) regime. For a G {0.3, 0.5, 0.7} we now 
determined three values of f3 w from the contour LASSO line that corresponds to p = 3 in the figure given 
in Section 2. We then again ran (6) assuming that ||x||i is known and (227) using the theoretical values for 
Ev. For the scenario when a = 0.3 we set n = 3000 while for the scenarios with other two values of a 
we set n = 2000. Obtained results are presented in Table 2. The theoretical values for any of the simulated 
quantities in any of the simulated scenarios are again given in parallel as bolded numbers. We again observe a 
solid agreement between the theoretical predictions and the results obtained through numerical experiments. 

6.2 Numerical results related to signed x 

In this subsection we will present numerical results that relate to the theoretical ones created in Sections 3 
and 4. We will again consider two groups of (a, /?+) regimes, one that we will refer to as the low (a, /3+) 
regime and the other that we will refer to as the high (a, /3+) regime. 
2) Low (a, /?+) regime — p= £||w '7°+ 112 = 2 

We first ran a set of experiments intended to show a specific behavior of the LASSO's from (158) and 
(227) in what we will refer to as the low (a,/3+) regime. For a G {0.3,0.5,0.7} we determined three 
values of /3+ from the contour LASSO line that corresponds to p = 2 in the figure given in Section 3. We 
then ran (158) assuming that ||x||i is known and (227) using theoretical value for Ev + where, as mentioned 
in Section 4, v + is the solution of (181). Also when running (227) we now added positivity constraints 
on the elements of x. We call the optimal value of the objective in (228) (conn+- When a = 0.7 we set 
n = 1500 while for the other two values of a we set n = 2000. Obtained results are presented in Table 3. 
The theoretical values for any of the simulated quantities in any of the simulated scenarios are as usual given 
in parallel as bolded numbers. We once again observe a solid agreement between the theoretical predictions 
and the results obtained through numerical experiments. 

2) High (a, /?+) regime — p= gi|w '7°+ 112 = 3 

As in the previous subsection we also ran a carefully designed set of experiments intended to show a 
specific behavior of the LASSO's from (158) and (227) in what we will refer to as the high (a, /?+) regime. 
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Table 3: Experimental/theoretical results for the noisy recovery through LASSO's; a = 1, p = 
E II ™iasso+ lb = 2; (6) and (227) were run 100 times 



a 


Pw/<* 


Ev+ 




E |w conn+ b 


E(obj + 

\fn 


E \\~Wlasso+ |2 


0.3 


0.286 


0.9592 


0.2454/0.2449 


1.9939/2 


0.2461/0.2449 


1.9876/2 


0.5 


0.3842 


0.6516 


0.3133/0.3162 


2.0229/2 


0.3140/0.3162 


2.0177/2 


0.7 


0.4849 


0.4292 


0.3786/0.3742 


1.9947/2 


0.3794/0.3742 


1.9886/2 



Table 4: Experimental/theoretical results for the noisy recovery through LASSO's; a = 1, p = 
E || wiasso+ II 2 = 3; (6) and (227) were run 100 times 



a 


!3+/a 


Ev+ 


sjn 


E |w conn+ I2 


E(obj + 

sfn 


E W; asso+ 1| 2 


0.3 


0.3423 


0.8197 


0.1713/0.1732 


3.1213/3 


0.1723/0.1732 


3.0898/3 


0.5 


0.4672 


0.5757 


0.2245/0.2236 


2.9983/3 


0.2255/0.2236 


2.9860/3 


0.7 


0.5971 


0.3470 


0.2644/0.2646 


3.0373/3 


0.2654/0.2646 


3.0218/3 



Following further the methodology of the previous subsection for a G {0.3, 0.5, 0.7} we determined three 
values of /3+ from the contour LASSO line that corresponds to p = 3 in the figure given in Section 3. We 
then again ran (158) and (227) (when running (227) we of course again added positivity constraints and we 
again used theoretical value for Eu + ). When a = 0.7 we set n = 1500 while for the other two values of 
a we set n = 2000. Obtained results are presented in Table 4. The theoretical values for all quantities of 
interest are again given in parallel as bolded numbers. We once again observe a solid agreement between 
the theoretical predictions and the results obtained through numerical experiments. 

7 Discussion 

In this paper we considered "noisy" under-determined systems of linear equations with sparse solutions. 
We looked from a theoretical point of view at classical polynomial-time LASSO algorithms. Under the 
assumption that the system matrix A has i.i.d. standard normal components, we created a general framework 
that can be used to characterize various quantities of interest in analyzing the LASSO's performance. Among 
other things, the framework enables one to precisely estimate the norm of the error vector in "noisy" under- 
determined systems. Moreover, it can do so for any given fc-sparse vector ic. 

While many quantities of interest in LASSO recovery can be computed through the mechanism pre- 
sented here, to demonstrate its power we in this introductory paper focused only on, what we called, 
LASSO's generic performance. We essentially established the precise values of the "worst-case" norm- 
2 of the error vector. On the other hand, using the framework one can create a massive set of results for 
the LASSO's non-generic or as we will refer to it problem dependent performance. However, this goes 
significantly over the scope of an introductory paper. We will dissect problems from this direction into tiny 
details in one of the forthcoming papers. Also, the existence of an SOCP type of the recovery algorithm that 
achieves the same norm-2 of the error vector as the LASSO does followed as a by-product of our analysis. 

As for the applications, further developments are pretty much unlimited. Literally every problem that 
we were able to solve in the so-called noiseless case (and there was hardly any that we were not) through 
the mechanisms from [63] and [62] can now be handled in the noisy case as well. For example, quantifying 
performance of LASSO or SOCP optimization problems in solving "noisy" systems with special structure 
of the solution vector (block-sparse, binary, box-constrained, low-rank matrix, partially known locations of 
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nonzero components, just to name a few), "noisy" systems with noisy (or approximately sparse)) solution 
vectors can then easily be handled to an ultimate precision. In a series of forthcoming papers we will present 
some of these applications. 
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